Page 337 - Special Topic Session (STS) - Volume 3
P. 337

STS547 Daan Zult et al.
               with covariates. These two steps are discussed in section 2.1. Third, this model
               is extended towards multiple – sources, which is discussed in section 2.2.
               2.1  Capture - recapture estimation and linkage error correction
                        In the most basic case of CR the PSE is given by the standard Petersen (Petersen,
               1986, Lincoln, 1930) formula:
                ̂
                  =  11  +  10  +  01  +    10  01  =   ( 11 + 10 )(( 11 + 01 )  =   1+  +1     (1),
                                                   11           11           11
                                                          ̂
               where under the appropriate assumptions     is an unbiased estimate
               of the true population size (Wolter, 1986). The Petersen estimator is closely
               related to a fitted value obtained from a log - linear Poisson regression model
               with cell counts data (e.g. see Cormack, 1989), i.e.:

               [ ] =  ( 0 +  1 +  2 )  for i, j ∈ {1,0}                                                      (2),
                    
               where    serves  as  the  dependent  variable  in  the  log  -  linear
                         
               regression model. The Poisson regression model uses maximum likelihood
               to obtain estimates   ,   ,   . An important difference between equation (1)
                                   ̂
                                       ̂
                                           ̂
                                    0
                                           2
                                        1
               and (2) is that (2) can be easily extended with additional sources or categorical
               covariates.
                   When the appropriate assumptions are not met, for instance records are
                                    ̂
               not perfectly linked,    is biased. Therefore D&F developed a linkage
               error correction method that uses a rematch study from which they calculate
               the linkage error probabilities that are used to  correct the PSE for linkage
               errors. DW show that this correction method can be written as:
                ̂
                &  =    1+  +1                                                                                                                  (3),
                          ̂ 11
               where ̂  is the estimated number of links between both sources that takes
                       11
               linkage errors into account. Combining equation (1), (2) and (3) allows us to
               write:

               [̂ ] =  ( 0 +  1 +  2 )  for i, j ∈ {1,0}                                                                      (4)
                   
                                  ∗
               where ̂ 11  =  11   11  , ̂ 10  =  1+  − ̂   and ̂ 01   =  +1  − ̂ 11  (see  Zult  et
                                                     11
                                  ∗
                                 
                                  11
               al. (2019) for a more extensive derivation). In words, equation (4) constitutes
               the  same  model  as  equation  (2),  except  the  dependent  variable   is
                                                                                      
               replaced by ̂ , where ̂  is simply a vector of estimated cell counts that
                                         
                              
               is based on the results of the audit study. Here we should note that the
               calculation of ̂  is independent of the exact linkage procedure  . In fact,
                               11
                                                                ∗
               the only thing that matters is that the fraction    11  is a consistent estimate
                                                                ∗ 11
               of    11  , which implies that the audit study should be representative for .
                   11

                                                                  326 | I S I   W S C   2 0 1 9
   332   333   334   335   336   337   338   339   340   341   342