Page 324 - Contributed Paper Session (CPS) - Volume 2
P. 324
CPS1874 Yiyao Chen et al.
The fraction before () is unidentifiable from the observed test set
data. For c ∈ [0, 1], we let () be the numerator of the unidentifiable fraction,
1
((0) = 1| > , (1) = 1, (1) = 1, = 1. which represents the probability
of being biopsied in the training set for participants with risks larger than c
who would be biopsied in the test set, and diagnosed with cancer. Similarly,
we let () denotes ((0) = 1| ≤ , (1) = 1, (1) = 1, = 1 . The
2
() becomes
()()
() = 1 .
()() + ()(1 − ())
2
1
We assume that regardless of cohort or strata membership, patients with
higher risks should have a greater chance of being biopsied than patients with
lower risks.
Assumption 2 For any ∈ [0,1], () ≥ ().
1
2
Note that as ranges from 0 to 1, and () and () range from 1
to 0, with () ≥ () in the middle of the range of ; ()
increases as () decreases, obtaining its maximum value of 1 at () =
2
2
0. When () = () i.e. the probabilities of being biopsied in training are
1
2
the same for biopsied participants with risk larger than the threshold or less
than or equal to the threshold, the equals to . For evaluation at
multiple thresholds, we further assume the following.
Assumption 3 For any , ∈ [0,1] ≥ , ( ) ≥
1
2
1
1
2
1
( ) ( ) ≥ ( ).
2
1
2
1
2
2
In addition to true positive rates, validation also considers the false positive
rate per threshold , (), which evaluates the proportion of patients
without cancer in the test set that tested positive: () =
( > |(1) = 1, (1) = 0, = 1). Similar to the derivation for the , we
set () = ((0) = 1| > , (1) = 0, (1) = 1, = 1) and () =
4
3
((0) = 1| ≤ , (1) = 0, (1) = 1, = 1) and define this rate on the always
biopsied stratum as:
()()
() = 3 , (), (), ∈ [0,1].
4
3
()() + ()(1 − ())
3
4
The area under the receiver operating curve (AUC) is calculated by integral
under the curve of TPR on the y-axis versus FPR on the x-axis:
1 1
−1
() = ∫ ( ()) = ∫ () ′ (),
0 0
−1
where (∙) is the inverse function of and ′ (∙) is the first-
degree derivative of .
95% confidence intervals for the observed as well as the principal stratum
measured TPRs and FPRs were calculated using asymptotic approximations,
and for all the AUCs using the bootstrap with 2000 samples.
313 | I S I W S C 2 0 1 9