Page 70 - Special Topic Session (STS) - Volume 4
P. 70
STS563 Patrick Graham et al.
Notice that the number of people in the (0,0) cell in Table 1, corresponding
to “not in the target population and not on the list” is assumed to be 0. In fact,
most of the world’s population falls in the cell! However, we are not interested
in estimating the population of the world but of some specific target
population such as the usually resident population of New Zealand, and we
are seeking to use an administrative list for this purpose. For this problem, only
people in the target population or on the list or in both are relevant. That is
our conceptual starting point for estimation is the union of the target
population and the list (cf Zhang (2015)). We let NU denote the size of the
target-list union.
If a sample has been drawn from the target population with sample
inclusion probabilities, λ(x), independently of list inclusion, and the sample is
linked to the list without error, cross-tabulation of sample and list produces a
2 x 2 table (at each setting of X) underpinned by the probabilities shown in
Table 3. For simplicity, we regard the λ(x) as known. In practice λ(x) may need
to be estimated. From Table 3 it can be seen the sampling process transfers
some people from the (1, 1) cell in the target-list union to the (0, 1) cell in the
sample-list union, and some people from the (1, 0) cell in the target-list union
to the (0,0) cell in the sample-list joint distribution. This cell is, in reality not
observable. This needs to be accommodated in the analysis. An important
point is that Table 3 does not represent a traditional capture-recapture, or
dual-systems population estimation problem. Whereas the latter involves two
or more samplings from a target population we have a single sampling from
the population which is linked to a list that overlaps the target population. The
observed (0, 1) cell comprises a mix of people from the target population that
were not included in the sample and people genuinely not in the target
population. Traditional DSE methods cannot accommodate the latter group.
3. Inference. We base inference on the posterior predictive distribution of a
corrected list from which individuals not in the target population have been
removed and the target population members missed by the list have been
added. If we can generate corrected lists from this distribution, then for each
draw we could obtain population counts for all cells of interest by simple
tabulation. The tabulations obtained by repeating this for each simulated
corrected list, represent a sample from the joint posterior distribution of the
cell counts. Summaries of this distribution such as the median, other quantiles,
and approximate credible intervals can be obtained straightforwardly.
Introducing the notation to denote the cell-location for an individual in
the target-list union, to denote the cell location in the sample-list union,
̃
letting = (, ) where denotes the vector of parameters for the models for
(), and (x), letting () = = ( ( ), ( ), ( )) denote the
10
01
01
11
vector of cell probabilities at covariate setting x, and assuming the covariate
59 | I S I W S C 2 0 1 9