Page 336 - Special Topic Session (STS) - Volume 3
P. 336

STS547 Daan Zult et al.
               recapture estimator for linkage errors. Recently, Di Consiglio and Tuoto (2018)
               (DC&T_18) extended their method to three sources.
                   In this paper we provide a general framework that allows us to extend this
               work further in two ways, with covariates and multiple sources. This is done by
               generalising the standard log - linear modelling approach used in multiple -
               recapture estimation such that it incorporates linkage error correction. This
               leads to the weighted multiple – recapture (WMR) model and is discussed in
               section 2. In section 3 we show the results of a simulation study that tests the
               WMR model.

               2.  Methodology
                   We first introduce some formal notation.  S defines the source, where in

               standard CR  = (1,2) and in MR  = (1,2, ... ). Next, we define the linked ‘register’
                −1  as:
                          0=  1
                            

                         
                             ( ,  )
                         1=  1  1  2
                             ( ,  )         ,
                −1  =   2=  2  1  3
                                 ⋮

                       { −1=    −2 ,  )
                               (
                                       
               where    refers to a set of t + 1 sequentially linked sources and   refers
                       
                                                                                   

               to the linkage process that links  −1   with  +1  . In case of CR this reduces
               to    =    = ( ,  ).  The  true  cell  counts,  estimated  cell  counts  and
                                  2
                    1
                                1
               observed cell counts (i.e. the counts of records that are linked and not
               linked between  −1  and  +1 ) are denoted as  = ( ,  ,  ), ̂   =
                                                                            10
                                                                                01
                                                                       11
                                                                
               (̂ 11  ,  ̂ 10  ,  ̂ 01  )  and   = ( ,    ,   01 )  respectively.  Here  i  ∈  {1,0}
                                                   10
                                              11
                                       
               corresponds  to  records  in  and  not  in  −1  and  j  ∈  {1,0}  corresponds  to
               records in and not in  +1 . When there are no linkage errors, the true cell
               counts are equal to the observed cell counts, i.e.  =   . Furthermore,
                                                                          
                                                                    
               we  define   = ( ,  ,  ) and   = ( ,  ,  ) as  the  true  and
                             ∗
                                                       ∗
                                    ∗
                                         ∗
                                                                 ∗
                                             ∗
                                                             ∗
                                                                     ∗
                                             01
                                                                     01
                                                             11
                                                                 10
                                    11
                                         10
                             
                                                       
               observed cell counts in a random sample from  −1  called a rematch or audit
               study (for a discussion on the difference between rematch and audit sample,
               which  is  small,  we  refer  to  Zult  et  al.  (2019)).  Beside  that   refers  to  a
                                                                             ∗
                                                                             
               subsample, the difference between   and   is that in the presence of
                                                     ∗
                                                              
                                                     
               linkage  errors     is  assumed  to  be  known  while   is  not.  Finally,  we
                                ∗
                                                                      
                                
               introduce   = 1, … ,   which  are  the  records  in  .  Under  perfect  linkage
                                    
                                                                 
               this implies that all records refer to unique units/individuals, but in case of
               linkage errors two records in   might belong to different units/individuals
                                              
               or one record in   might represent two or more units.
                                 
                   The derivation of the WMR model follows three steps. First the D&F model
               is written as log – linear Poisson regression model. Second, the dependent
               variable in this model is corrected for linkage errors in case of two sources but
                                                                  325 | I S I   W S C   2 0 1 9
   331   332   333   334   335   336   337   338   339   340   341