Page 326 - Special Topic Session (STS) - Volume 3
P. 326
STS547 Maarten C. et al.
Multiple system estimation for the size of the
Māori population in New Zealand
2
1,2
1
Maarten Cruyff , Peter G.M. van der Heijden , Paul A. Smith , Christine
3
3
Bycroft , Patrick Graham
1 Utrecht University, the Netherlands
2 University of Southampton, UK
3 Statistics New Zealand
Abstract
We investigate the situation where two or more registers, or lists, of individuals
are linked both for the purpose of population size estimation and to
investigate the relationship between variables appearing on all or only some
of the registers. There is usually no full picture of this relationship because
there are individuals that are in only some of the lists, and also individuals that
are in none of the lists. These two problems have been solved simultaneously
in dual system estimation using the EM algorithm. We extend this approach
to four registers (including the population census) to estimate the size of the
indigenous Māori population in New Zealand, where the reporting of Māori is
not the same in each register and where there is a further missing data
problem, with individuals included in one or more registers who did not
provide their ethnicity. We consider the implications for estimating the size of
the Māori population from administrative data only.
Keywords
dual system estimation, linkage, missing data, register, coverage
1. Introduction
The use of dual system estimation (DSE, also known as capture-recapture
or the Lincoln-Peterson estimator) to estimate the size of a population which
cannot be completely observed has become widespread in official statistics,
particularly as a key part of making estimates from a population census (eg
Brown et al. 1999, 2019), though also in situations involving the use of linked
administrative data sources. The need to make efficient use of data already
available to government in the construction of official statistics outputs has
led to better access to administrative data, and linkage of the records from
these sources is being widely used to understand and estimate corrections for
the under- and over-coverage within them. We will use “registers” as a generic
term for all sources containing lists of identifiable units.
When two registers are linked, in general there will be some records from
one register which remain unlinked, because there is no corresponding record
in the other source. This leads to missing data for any variables which appear
315 | I S I W S C 2 0 1 9