Page 322 - Special Topic Session (STS) - Volume 3
P. 322
STS547 John D. et al.
We start by considering an SPD compiled from administrative sources. The
SPD is compiled to the best of our abilities but is suspected of suffering from
undercoverage as well as overcoverage. We now consider all relevant units
(persons) U as including persons in both the population and the SPD ( of
size ), persons in the SPD but not in the population ( , equating to
11
overcoverage in SPD of size ) and the number of persons in the population
01
and not in the SPD ( or undercoverage of size ). , 01 and are
11
10
10
unobserved but = + , the size of the SPD is observed. The objective
11
01
is to estimate = + , the size of the target population.
10
11
First we consider every unit i in the universe U as a multinomial trial with
probabilities ( ) = , P(iϵ ) = and P(iϵ ) = with + 01 +
11
10
11
01
10 = 1. Table 1 illustrates this relationship between the target population
and the SPD.
To estimate the size of the target population, Graham and Lin (2019)
propose sampling the target population with known sample inclusion
probabilities and linking the sampled units to the administrative list in an error
free way. In practice, an area frame in conjunction with a well maintained
dwelling register will allow for sampling dwellings with a known inclusion
probability. Known inclusion probabilities for individuals then requires an
assumption of no within dwelling non-response. Various field procedures can
be used to approximate this assumption as closely as possible. However, to
simplify notation and explanation, we consider a simple random sample of
individuals with a constant and known inclusion probability . Table 2 provides
the corresponding cell probabilities, for the relationship between the SPD and
the sample in terms of which is assumed known and the multinomial
probabilities in Table 1. In practice, the underlying probability model (Table 2)
is extended to include covariates such as age, sex, ethnicity and geography.
We will use , , 00 to denote the cell counts in the cross-
01
00
10
tablulation of sample and list inclusion (i.e the table of counts corresponding
to Table 2), where , are directly observed. We note the count
01
00
10
for observed (0,1) cell in the sample - list union, n01 contains a mix of people
in the target population but not included the sample and people genuinely
not in the target population. Consequently the inference is not a standard DSE
problem, which deals only with undercoverage in the observed data.
Graham and Lin (2019) take a Bayesian approach to inference which
follows from the joint posterior distribution for = /( 11 + ) and
10
10
.The posterior distribution for the remaining cell probabilities can be easily
01
obtained using 11 = (1 − )(1 − ), 10 = (1 − ) . Given
10
01
the posterior distribution for the cell probabilities, the posterior distribution
for the total target population size can be obtained. Graham and Lin (2019)
evaluate two methods for completing the target population unit record file.
The first uses the estimated model probabilities and estimated to impute
311 | I S I W S C 2 0 1 9