Page 341 - Special Topic Session (STS) - Volume 3
P. 341
STS547 Daan Zult et al.
4. Discussion and Conclusion
In this paper we derived and tested the WMR model for population size
estimation corrected for linkage error. The model is derived from the D&F
model and is a more general extension than the models developed by DC&T
(2015, 2018) and De Wolf et al. (2018) because it can deal with three or more
sources and covariates. Furthermore, the WMR model is incorporated in the
more general family of log - linear regression models and therefore no longer
has to be studied as an isolated issue in CR and MR models. Finally, the WMR
model was tested and approved in a simulation study.
In theory the WMR model might be an improvement on the D&F model,
they both still require the availability of a rematch (for D&F) or audit (for WMR)
study. The advantage of the WMR model is that an audit study might be easier
to obtain, because it has lower requirements (it needs to be constructed on
the cell count level instead of the much more detailed records matching pair
level). However, the incorporation of covariates and additional sources in the
WMR model also puts additional constraints on the audit study, in the sense
that the audit study should include these same covariates and additional
sources. Given that the sample that underlies the audit study must be
representative for R , this might be more difficult for increasing t.
t
Also, we should note that we paid little attention to the impact of the exact
linkage procedure. In section 2 we developed the WMR model in the context
of the common sequential linkage approach, in which first two sources are
linked and a third source is linked to this combined source. However, it is also
possible that sources are linked pairwise or simultaneously. These approaches
are less common because they suffer either from computational (i.e. the
number of potential matches between multiple sources increases
exponentially) or methodological (e.g. what to do with inconsistent matching
patterns like A → B, B → C, C ↛ A?). Furthermore, in the simulation study of
section 3 we applied probabilistic linkage that uses techniques developed by
Fellegi and Sunter (1969), Winkler (1988) and Jaro (1989) that aim to optimise
the quality of matches on the matching pair level, while matching techniques
that are designed to optimise the quality of the matches on the cell count level
might already significantly reduce the problem of linkage errors in population
size estimation.
Another point that deserves some discussion is the ‘individual starting
weight of 1’. Lists or registers of individuals sometimes also contain individual
sample weights, which indicate the size of the group that this individual
represents as part of the total population. There is no reason why these sample
weights cannot replace the starting weights of 1 in the WMR model.
Furthermore, when additional sources also contain sample weights they can
be used to calculate , and in a slightly different way, i.e. simply by
∗
∗
adding up sample weights instead of counting. This way we would get ‘linkage
330 | I S I W S C 2 0 1 9