Page 323 - Contributed Paper Session (CPS) - Volume 2
P. 323
CPS1874 Yiyao Chen et al.
′
Assumption 1 = ( , , … , , … , ) =
2
1
( , , … , , … , ) , ℎ .
ℎ
′
′
1
2
= ′ , ℎ () = ( ) () = ( ).
′
′
We now leave out the foot mark in the biopsy and outcome indicators to
avoid notation redundancy and use ()and () as biopsy and outcome
indicator with group assignment be . We focus on the joint potential biopsy
outcomes ((0), (1)) for each patient, of which only one outcome can ever
be observed, and which stratifies patients into four categories. The first is the
always biopsy group (1,1) of patients that would have undergone biopsy no
matter which cohort, training versus test. Patients in this strata are of interest
for assessing reproducibility of the risk tool developed on the training set. The
next strata of interest (1,0) represents patients who would be biopsied in the
training set but not the test set, and thus would be useful for assessing the
generalizability of the risk tool developed on the training set. The remaining
two strata, (0,1) and (0,0), comprise patients who would not have formed the
risk tool in the training set of which validation is of interest, so are not of
interest here. For this report, we focus on the always biopsy stratum to assess
reproducibility of the tool.
With denoting () to simplify notation, the true positive rate at
threshold ∈ (0,1) evaluated on the test set is given by
() = ( > |(1) = 1, (1), = 1).
The () is fully observable as it is measured on the biopsied
participants on the test set. The test set, however, comprises a mixture of
patients exchangeable to those in the training set as well as those divergent
from the training set in an unknown ratio. It is this lack of clarity that leads to
the differences in operating characteristics of single risk tools across
populations and subsequent confusion in the literature. Therefore, we
recommend as an additional pure estimate of the reproducibility of the
on the always biopsied stratum:
() = ( > |(1) = 1, (0) = 1, (1), = 1).
The () is not estimable from the observed data and not identifiable
without additional restrictions. In the following we propose the simplest of
such restrictions as well as sensitivity analyses over plausible violations of the
restrictions. The conditional probability representing () can be written
as fraction with numerator and denominator equal to ( ∈ , (1) =
1, (0) = 1, (1) = 1, = 1) for = ( > ) and (0, 1), respectively. The
probabilities can be decomposed as ((0) = 1| ∈ , (1) = 1, (1) = 1, =
1) × ( ∈ |(1) = 1, (1) = 1, = 1) × ((1) = 1, (1) = 1, = 1), where
the last term cancels from the numerator and denominator, leaving
((0) = 1| > , (1) = 1, (1) = 1, = 1
() = ().
((0) = 1|(1) = 1, (1), = 1
312 | I S I W S C 2 0 1 9