Page 321 - Special Topic Session (STS) - Volume 3
P. 321

STS547 John D. et al.
               records. Typically, this is done by recompiling the SPD without one of the
               underlying data sources to get our trimmed list  , then comparing ,   with
                                                                                   ̂
                                                                                    
                                                               
               N to see if they are the same. If ,   is less than  then this indicates that
                                                                 ̂
                                                  ̂
                                                   
               capture rate for the trimmed segment (size k) is less than the capture rate for
               list   indicating that there is a higher proportion of erroneous records in list
                    
               A compared to list  . Therefore we consider estimator ,   to be less biased
                                                                       ̂
                                   
                                                                        
               than . We can apply this idea iteratively to each data source in a strategy to
                     ̂
               eliminate  erroneous  records  from  the  SPD  and  obtain  a  less  biased  and
               possibly an unbiased estimate of the population.
                   An  alternative  trimming  strategy  could  involve  scoring  records  using
               criteria correlated with the likelihood that those records are erroneous and
               then incrementally trimming based on these scores to identify some point
               where  the  trimming  is  no  longer  effective  in  removing  bias  (population
               estimates are not changing significantly). The variance of the estimator should
               also  be  monitored  through  the  trimming.  The  variance  of  the  trimmed
               estimator is estimated in the same manner as that for the DSE estimator but
               using list sizes and matches after trimming.
                   Alternative  trimming  strategies  can  be  deployed.  Effective  trimming
               strategies are those where trimming steps remove erroneous records without
               removing too many valid records. The more valid records that are removed,
               the smaller the match and hence the greater the variance of the estimator.
               There is a cost to trimming and poor trimming strategies can lead to unstable
               and possibly biased estimators with poor precision (large variance).
               2.2. Statistics New Zealand - Graham and Lin Method
                   Graham and Lin (2019) provide a comprehensive and detailed account of
               the  methodology.  Here  we  present  a  greatly  simplified  account  of  the
               approach.
                                                  In SPD    Not in SPD
                                In Population    (ϕ )        (ϕ )  
                                                 11
                                                                           
                                                                    10
                                                                10
                                                      11
                            Not in Population    (ϕ )            0(0)
                                                      01
                                                 01
                                                      
                                                        
               Table 1. Relationship between SPD and target population using the Graham
               Lin  method  to  estimate  the  target  population  size    .  Multinomial
                                                                          
               Probabilities (adding to 1) denoted in parenthesis.

                                                         In SPD    Not in SPD
                                  In Sample                ϕ           ϕ
                                                                           10
                                                             11
                              Not in Sample    (1 − )ϕ 11  + ϕ    (1 − )ϕ
                                                                           10
                                                             01
               Table 2. Probability distribution for breakdown of SPD and Sample. Graham
               and Lin Method.



                                                                  310 | I S I   W S C   2 0 1 9
   316   317   318   319   320   321   322   323   324   325   326