Page 73 - Special Topic Session (STS) - Volume 4
P. 73

STS563 Patrick Graham et al.
            and an approximation to the posterior for the coverage model parameters can

            be obtained as (∅|  ) ∝ (∅)  (∅).
            4.  Sample design and the inclusion probability. A standard design for
            household  surveys  in  official  statistics  involves  a  two-stage  area-based
            sampling design, where, at the first stage, small geographic areas known as
            primary sampling units or PSUs are sampled, and at the second sampling stage
            households  are  selected  within  PSUs.  If  there  is  no  within  household  non-
            response, the number of responding households in a PSU, divided by the total
            number of households in the PSU is the PSU level inclusion probability ().
            Thus,  when  supported  by  a  well-maintained  household  list  or  register,  a
            conventional  area  based  multi-stage  sample  design  can  yield  PSU-specific
            inclusion probabilities. In order for the PSU specific inclusion probability to
            apply to all individuals within a PSU, two assumptions must hold: (i) there must
            be no within household non-response; (ii) household non-response must not
            vary  with  dimensions  of  household  composition  that  relate  to  covariates
            included in the analysis. For the time-being we make these assumptions but
            note that further refinement of the inclusion probability may be possible if
            household  level  covariates  that  are  reasonable  proxies  for  household
            composition are available.
                In  order  to  support  the  multi-stage  design  we  model  the  coverage

            probabilities,   () and  () at the PSU level and specify hierarchical
                                          01
            models to pool information over the PSUs. The within PSU likelihood has the
            form, given by (3), with λ(x) set to the PSU-specific inclusion probability. The
            overall  model  likelihood  is  obtained  by  multiplying  the  PSU  specific
            likelihoods.  For  realistic  applications,  posterior  inference  for  ∅  using  the
            conditional  likelihood  approach  requires  MCMC  methods.  We  have
            implemented  the  model  using  the  Bayesian  modelling  software  STAN
            (Carpenter et al, 2017).

            5.  Application. We constructed a simulating target population by sampling
            800,000  records  from  the  2013  Census  usually  resident  population  while
            keeping  a  hierarchy  of  household,  PSU,  stratum,  territorial  authority,  and
            region.  From  this  we  drew  a  subset  of 31,881  records  to  represent  under-
            coverage. These records were excluded from the simulated administrative list.
            A  sample  of  59,223  records  were  selected  from  the  target  population  to
            represent the over-coverage group. Thus, the simulated list comprised 800,
            000−31, 881 = 768, 119 records from the initial target population selection of
            800, 000, plus an additional 59, 223 records representing over-coverage. The
            total size of the simulated list was therefore 828, 342. The under and over-
            coverage proportions were 4% and 7%, respectively. The records selected into
            the  under  and  over-coverage  groups  were  selected  using  coverage
            probabilities chosen to reflect plausible patterns of variation in coverage by

                                                                62 | I S I   W S C   2 0 1 9
   68   69   70   71   72   73   74   75   76   77   78