Page 320 - Special Topic Session (STS) - Volume 3
P. 320

STS547 John D. et al.
               Every unit  in the population  has an equal chance  of being captured in
               list .
                   Adding an additional assumption, the event that a person is captured in
               list B is independent of any other person being captured in that list, Zhang
               and Dunne (2018) show a variance estimator similar to that derived by Sekar
               and Deming (1949) and presented in Bishop et al. (1975, page 233).
                   These  assumptions  are  more  relaxed  than  those  presented  by  Wolter
               (1986).  This DSE model can be applied in many more scenarios  where the
               Wolter assumptions may not hold true. One scenario is where list A is derived
               from administrative data sources.
                   The Irish PECADO project proposes a system where the SPD, compiled
               from the activity records in individual public administration systems, is list A
               (size x) with heterogeneity in the capture rates and a second administrative list
               as list B (size n) satisfying the homogeneous capture assumption. List B (DLD)
               is composed of those persons applying for or renewing their driver licence in
               a given year. In Ireland, drivers have to renew their licence at least every 10
               years and are required to show that they are resident in the State. We assume
               neither list has erroneous records and we also assume perfect linkage based
               on official Identification Numbers. Erroneous records can be considered as a
               record  that  is  not  related  to  a  person  that  should  be  included  in  the
               population. The population estimate,  is compiled as  = / where m is
                                                                     ̂
                                                    ̂
               the size of the match between list A and list B. Post stratification by single year
               of age, gender and nationality group is also implemented to strengthen the
               homogeneous capture assumption and provide population estimates by these
               groups. An additional assumption of no undercoverage for those under 18
               years of age in list A is also made as DLD has no coverage in this age group.
               DLD is further validated as a suitable list B by swapping in a smaller list derived
               from  a  survey  (underpinned  by  homogeneous  capture  assumption)  and
               comparing results. TDSE methods are used to hunt for erroneous records.
                   The  theory  underpinning  TDSE  is  based  on  the  concept  that  if  the
               assumption of homogeneous capture holds, then when list A is trimmed of k
               records  to  get  a  new  (trimmed)  list   of  size   −  , there  should  be  no
                                                      
               significant difference between the untrimmed population estimate N and the
               population estimate after trimming  . The size of the match between list 
                                                   
                                                                                         
               and  list  B  is  −  where   is  the  number  of  records  from  the  trimmed
                                           1
                                  1
               segment that now need to removed from the match between list A and list B.
                                                                    (−)
                                                              ̂
               This provides the trimmed population estimate,  =   (− 1 ) .
                                                               
                   We use TDSE methods to evaluate suspect parts of list A for records that
               are not part of the population. While in theory the SPD is designed to remove
               them, in practice, there may be errors in processing of administrative data
               sources that may result in erroneous records being included in the SPD. To do
               this we identify parts of the SPD where we suspect there may be erroneous



                                                                  309 | I S I   W S C   2 0 1 9
   315   316   317   318   319   320   321   322   323   324   325