Page 77 - Special Topic Session (STS) - Volume 4
P. 77
STS563 Davide Di Cecco et al.
available information should be included in the process of identification of the
erroneous cases in the lists. Ideally, recognizing and deleting spurious cases
should constitute a first phase of our analysis, after which some capture-
recapture technique might be used on the “cleaned” data. However, in many
cases, the available information does not suffice to single out every false
capture, and there will remain a certain portion of uncertainty for which we
have no capability of discerning the cause of error. In practice, the main
approach in official statistics is the following: all available administrative
sources are integrated into a unique population statistical registers. The
register is coupled with an ad-hoc coverage survey (in the same way as
censuses were coupled with an additional post enumeration survey) to exploit
a Dual Systems Estimator (DSE). Then, the overcoverage rate is estimated on
the basis of the comparison between the (supposedly) error-free survey and
the administrative data via some supervised model, and then used to “correct”
the DSE in some way. An original approach, called Trimmed DSE, and
proposed in Zhang et al (2017), consists in an iterative procedure which
removes units and estimate a DSE until a stopping criterion is satisfied. The
authors prove that, if the survey has no overcoverage, the procedure has some
optimal properties of convergence. The Dual System approach, including the
aforementioned, has the remarkable property of being partic¬ularly robust
(see, e.g., Chao et al 2001), and it does not rely on any complex model
specification. Our approach, on the converse, relies on a Multiple Record
System, where one considers the various adminis¬trative sources separately,
in order to exploit the information redundancy. There exist various proposals
in literature which use complex model to deal with false captures in multiple
lists, particularly in animal abun¬dance problems, see, e.g., da Silva (2009),
Wright et al (2009), and Link et al (2010). However, in all those works, the false
captures are essentially duplicate linkage errors. To our knowledge, the only
contributions dealing with false captures with no restrictive hypothesis on the
source of error in multiple record systems are Overstall et al (2014) and
Fegatelli et al (2017). The former proposes a Bayesian log-linear model, the
latter extends that work in order to include latent variables. However, in both
cases, only a single source list is assumed to suffer from false captures. When
considering administrative sources separately, a series of methodological
issues arises:
• It is necessary to take into account possible dependencies among
the various sources.
• While DSE is known to be robust with respect to violation of basic
hypotheses (e.g., the homogene¬ity of capture probabilities), this
is not true in general in Multiple Record Systems.
• In our framework, administrative sources often target specific
categories of citizens (e.g., people in a certain age range), leaving
subset of the population with null probability of being captured.
66 | I S I W S C 2 0 1 9