Page 73 - Special Topic Session (STS) - Volume 4
P. 73
STS563 Patrick Graham et al.
and an approximation to the posterior for the coverage model parameters can
be obtained as (∅| ) ∝ (∅) (∅).
4. Sample design and the inclusion probability. A standard design for
household surveys in official statistics involves a two-stage area-based
sampling design, where, at the first stage, small geographic areas known as
primary sampling units or PSUs are sampled, and at the second sampling stage
households are selected within PSUs. If there is no within household non-
response, the number of responding households in a PSU, divided by the total
number of households in the PSU is the PSU level inclusion probability ().
Thus, when supported by a well-maintained household list or register, a
conventional area based multi-stage sample design can yield PSU-specific
inclusion probabilities. In order for the PSU specific inclusion probability to
apply to all individuals within a PSU, two assumptions must hold: (i) there must
be no within household non-response; (ii) household non-response must not
vary with dimensions of household composition that relate to covariates
included in the analysis. For the time-being we make these assumptions but
note that further refinement of the inclusion probability may be possible if
household level covariates that are reasonable proxies for household
composition are available.
In order to support the multi-stage design we model the coverage
probabilities, () and () at the PSU level and specify hierarchical
01
models to pool information over the PSUs. The within PSU likelihood has the
form, given by (3), with λ(x) set to the PSU-specific inclusion probability. The
overall model likelihood is obtained by multiplying the PSU specific
likelihoods. For realistic applications, posterior inference for ∅ using the
conditional likelihood approach requires MCMC methods. We have
implemented the model using the Bayesian modelling software STAN
(Carpenter et al, 2017).
5. Application. We constructed a simulating target population by sampling
800,000 records from the 2013 Census usually resident population while
keeping a hierarchy of household, PSU, stratum, territorial authority, and
region. From this we drew a subset of 31,881 records to represent under-
coverage. These records were excluded from the simulated administrative list.
A sample of 59,223 records were selected from the target population to
represent the over-coverage group. Thus, the simulated list comprised 800,
000−31, 881 = 768, 119 records from the initial target population selection of
800, 000, plus an additional 59, 223 records representing over-coverage. The
total size of the simulated list was therefore 828, 342. The under and over-
coverage proportions were 4% and 7%, respectively. The records selected into
the under and over-coverage groups were selected using coverage
probabilities chosen to reflect plausible patterns of variation in coverage by
62 | I S I W S C 2 0 1 9