Page 67 - Special Topic Session (STS) - Volume 4
P. 67
STS563 Patrick Graham et al.
Recent progress on implementing a Bayesian
approach to population estimation from an
administrative list subject to under and over-
coverage
Patrick Graham, Anna Lin
Statistics New Zealand, Christchurch, New Zealand
Abstract
Several statistical agencies are exploring replacing or enhancing traditional
census-based population estimation systems with administrative data.
Administrative data is prone to both under and over-coverage. Directly
estimating genuine list over-coverage due to erroneously enumerated
individuals no longer in the target population is challenging, because it is often
difficult to obtain definitive evidence of absence. We have been investigating
a Bayesian method for estimating both under and over-coverage of an
administrative list, which is based on a model for the joint distribution of
inclusion in the target population and the list. The model is fitted to the union
of a sample survey of the target population and the list. Estimation of list over-
coverage from the sample-list union is possible, given good information on
sample inclusion probabilities. In this paper we review the basic ideas of our
estimation methodology and report on recent progress with implementation
and evaluation of the model.
Keywords
population estimation; administrative data; bayesian inference; missing data
1. Introduction
Statistical agencies in several countries are investigating methods for
replacing traditional census-based population estimation system with
approaches based on administrative data (see, for example, Bycroft, (2015)).
Administrative lists may fail to include some people who are in fact in the
target population and also include people who are no longer in the target
population, due, for example, to undetected out-migration. Relative to a
traditional census, the latter problem (over-coverage) may be a more
significant issue for population estimation based on administrative data. By
population estimation we mean, not just the total size of the population, but
also the distribution of population across categories of key demographic
variables such as age, sex, ethnic group and area. We assume that it is possible
to conduct a highly quality survey of the target population and that this
sample can be linked to the list without error. We assume no other fieldwork.
In particular, the methodology outlined does not require any sampling from
the list. Thus, our approach makes use of the important insight of Zhang (2015)
that estimation of list over-coverage is possible without sampling directly from
56 | I S I W S C 2 0 1 9