Page 77 - Special Topic Session (STS) - Volume 4
P. 77

STS563 Davide Di Cecco et al.
            available information should be included in the process of identification of the
            erroneous cases in the lists. Ideally, recognizing and deleting spurious cases
            should  constitute  a  first  phase  of  our  analysis,  after  which  some  capture-
            recapture technique might be used on the “cleaned” data. However, in many
            cases,  the  available  information  does  not  suffice  to  single  out  every  false
            capture, and there will remain a certain portion of uncertainty for which we
            have  no  capability  of  discerning  the  cause  of  error.  In  practice,  the  main
            approach  in  official  statistics  is  the  following:  all  available  administrative
            sources  are  integrated  into  a  unique  population  statistical  registers.  The
            register  is  coupled  with  an  ad-hoc  coverage  survey  (in  the  same  way  as
            censuses were coupled with an additional post enumeration survey) to exploit
            a Dual Systems Estimator (DSE). Then, the overcoverage rate is estimated on
            the basis of the comparison between the (supposedly) error-free survey and
            the administrative data via some supervised model, and then used to “correct”
            the  DSE  in  some  way.  An  original  approach,  called  Trimmed  DSE,  and
            proposed  in  Zhang  et  al  (2017),  consists  in  an  iterative  procedure  which
            removes units and estimate a DSE until a stopping criterion is satisfied. The
            authors prove that, if the survey has no overcoverage, the procedure has some
            optimal properties of convergence. The Dual System approach, including the
            aforementioned, has the remarkable property of being partic¬ularly robust
            (see,  e.g.,  Chao  et  al  2001),  and  it  does  not  rely  on  any  complex  model
            specification.  Our  approach,  on  the  converse,  relies  on  a  Multiple  Record
            System, where one considers the various adminis¬trative sources separately,
            in order to exploit the information redundancy. There exist various proposals
            in literature which use complex model to deal with false captures in multiple
            lists, particularly in animal abun¬dance problems, see, e.g., da Silva (2009),
            Wright et al (2009), and Link et al (2010). However, in all those works, the false
            captures are essentially duplicate linkage errors. To our knowledge, the only
            contributions dealing with false captures with no restrictive hypothesis on the
            source  of  error  in  multiple  record  systems  are  Overstall  et  al  (2014)  and
            Fegatelli et al (2017). The former proposes a Bayesian log-linear model, the
            latter extends that work in order to include latent variables. However, in both
            cases, only a single source list is assumed to suffer from false captures. When
            considering  administrative  sources  separately,  a  series  of  methodological
            issues arises:
                    •  It is necessary to take into account possible dependencies among
                       the various sources.
                    •  While DSE is known to be robust with respect to violation of basic
                       hypotheses (e.g., the homogene¬ity of capture probabilities), this
                       is not true in general in Multiple Record Systems.
                    •  In  our  framework,  administrative  sources  often  target  specific
                       categories of citizens (e.g., people in a certain age range), leaving
                       subset of the population with null probability of being captured.

                                                                66 | I S I   W S C   2 0 1 9
   72   73   74   75   76   77   78   79   80   81   82