Page 326 - Special Topic Session (STS) - Volume 3
P. 326

STS547 Maarten C. et al.

                             Multiple system estimation for the size of the
                                   Māori population in New Zealand
                  Maarten Cruyff , Peter G.M. van der Heijden , Paul A. Smith , Christine
                                         Bycroft , Patrick Graham
                                      1  Utrecht University, the Netherlands
                                        2  University of Southampton, UK
                                           3  Statistics New Zealand

               We investigate the situation where two or more registers, or lists, of individuals
               are  linked  both  for  the  purpose  of  population  size  estimation  and  to
               investigate the relationship between variables appearing on all or only some
               of the registers. There is usually no full picture of this relationship because
               there are individuals that are in only some of the lists, and also individuals that
               are in none of the lists. These two problems have been solved simultaneously
               in dual system estimation using the EM algorithm. We extend this approach
               to four registers (including the population census) to estimate the size of the
               indigenous Māori population in New Zealand, where the reporting of Māori is
               not  the  same  in  each  register  and  where  there  is  a  further  missing  data
               problem,  with  individuals  included  in  one  or  more  registers  who  did  not
               provide their ethnicity. We consider the implications for estimating the size of
               the Māori population from administrative data only.

               dual system estimation, linkage, missing data, register, coverage

               1.  Introduction
                   The use of dual system estimation (DSE, also known as capture-recapture
               or the Lincoln-Peterson estimator) to estimate the size of a population which
               cannot be completely observed has become widespread in official statistics,
               particularly as a key part of making estimates from a population census (eg
               Brown et al. 1999, 2019), though also in situations involving the use of linked
               administrative data sources. The need to make efficient use of data already
               available to government in the construction of official statistics outputs has
               led to better access to administrative data, and linkage of the records from
               these sources is being widely used to understand and estimate corrections for
               the under- and over-coverage within them. We will use “registers” as a generic
               term for all sources containing lists of identifiable units.
                   When two registers are linked, in general there will be some records from
               one register which remain unlinked, because there is no corresponding record
               in the other source. This leads to missing data for any variables which appear

                                                                  315 | I S I   W S C   2 0 1 9
   321   322   323   324   325   326   327   328   329   330   331