Page 335 - Special Topic Session (STS) - Volume 3
P. 335

STS547 Daan Zult et al.



                            A linkage error correction model for population
                                  size estimation with multiple sources
                            1
                                                 1
                                                              1,2
                  Daan Zult , Peter – Paul de Wolf , Bart Bakker , Peter van der Heijden 3
                                            1 Statistics Netherland 1
                                               2 VU University
                                 3 Utrecht University and University of Southampton

               Abstract
               A new method is described to do population size estimation, while linkage
               of  sources  occurs  with  errors.  Our  model  is  derived  from  a  linkage  error
               correction model introduced by Ding and Fienberg (1994). They show how
               to use linkage probabilities to correct the capture - recapture estimator for
               linkage  errors,  but  only  in  the  case  of  two  sources  and  no  covariates.  A
               generalisation is proposed by incorporating the Ding & Fienberg model into
               the standard log - linear modelling approach used in multiple - recapture
               estimation. We show how the method performs in a simulation study with
               data that resemble real data.

               Keywords
               Multiple  –  recapture  estimation;  population  size  estimation;  capture  –
               recapture; record linkage; linkage errors

               1.  Introduction
                   This paper is a summary of Zult et al. (2019), which we refer to for a more
               extensive and elaborate discussion of this topic. The size of a partly observed
               population  is  often  estimated  with  the  capture  –  recapture  (CR,  for  two
               sources)  or  multiple  –  recapture  (MR,  for  multiple  sources)  method.  An
               important assumption for these models is that records in different sources can
               be identified such that it is known whether these records belong to the same
               unit  or  not,  i.e.  records  can  be  perfectly  linked  between  sources.  This
               assumption of perfect linkage is of particular relevance if identification is not
               obtained  by  some  perfect  identifier  (like  a  tag  or  id-code)  but  by  indirect
               identifiers (like name and address). In that case record are usually linked with
               probabilistic linkage (see Fellegi and Sunter, 1969, Winkler, 1988 or Jaro, 1989)
               and the perfect linkage assumption is often violated which generally leads to
               a biased population size estimate (PSE) (Gerritse et al., 2017).
                   A solution to this problem was provided by Ding and Fienberg (1994) (DF),
               Di  Consiglio  and  Tuoto (2015)  (DC&T_15)  and De  Wolf  et al.  (2018)  (DW).
               These authors show how to use linkage probabilities to correct the capture -



               1  The authors like to thank Jan van der Laan from Statistics Netherlands for his review of the
               final version of this the paper.


                                                                  324 | I S I   W S C   2 0 1 9
   330   331   332   333   334   335   336   337   338   339   340