Page 338 - Special Topic Session (STS) - Volume 3
P. 338

STS547 Daan Zult et al.
               Equation (4) allows for the inclusion of covariates in the same way as in a
               regular  log  -  linear  Poisson  regression,  which  implies  that ̂  must  be
                                                                               
               separated  further  into  groups  (e.g.  male/female)  and  this  categorical
               covariate  can  be  added  to  the  regression  equation.  We  refer  to  this
               extension of the D&F model as the weighted CR (WCR) model. Why it is
               called ‘weighted’ will become clear in the next section.
               2.2  The weighted – multiple recapture model
                       In section 2.1 we showed how the D&F model can be written as a log –
               linear  Poisson  regression  model  and  how  (categorical)  covariates  can  be
               added to this equation by splitting - up ̂ into smaller groups. This implies
               that after this procedure we have for each cell count both an estimated and
               observed  cell count.  Here  we  should  note  that each cell  count  consists  of
               records, so for each record we can calculate its weighted contribution to its
               estimated cell count, i.e.:

                     ̂
                =                                                                                                                                      (5)
                 
                      
               where ̂  and   refer to the estimated and observed cell count of record
                        
                               
               . E.g., when we ignore covariates and record  is linked between   and 
                                                                                  1
                                                                                         2
               , ̂  and  = 11.  Now   is  a  record  level  weight  that  sums  up  to  the
                          
                  
                                         
               different  elements  in ̂.  Adding  up  over   is  similar  to  the  case  of  no
                                                            
               linkage errors  where  each  record  has  a  weight  of 1  and is  added up to
               obtain  (the  true  and  observed)  cell  counts.  However,  when  we  want  to
               extend the model such that it can deal with multiple – sources, we can write
                 as:
                 
                         ̂  
                 
                =   −1                                                                                                                                           (6),
                 
                                 
                                                                              
               with   =0  = 1 ,  = ∑      −1 .  Under  equation  (6)   is  updated
                                                                             
                                
               after every linkage procedure, which can be repeated for each new source.
               After  the  update  of     the  estimated  cell  count  elements  of  ̂ can  be
                                       
                                       
               calculated  by  summing  up   over  the  records  that  belong  to  that  cell,
                                             
                                            
               where  ̂̂ does  not  only  distinguish  between   and    but  may  distinguish
               between any number of sources and categorical covariates. The WMR model
               can then be written as:
               [̂ ] =  (,  )                                                                                                                              (7),
                  
               where ̂  is the estimated cell count vector that depends on  = ( −1 , )
                                                                              
                        
               with a set of categorical covariates, according to some function (,  ) with
                                                                                    
                a parameter vector.


                                                                  327 | I S I   W S C   2 0 1 9
   333   334   335   336   337   338   339   340   341   342   343