Page 336 - Special Topic Session (STS) - Volume 3
P. 336
STS547 Daan Zult et al.
recapture estimator for linkage errors. Recently, Di Consiglio and Tuoto (2018)
(DC&T_18) extended their method to three sources.
In this paper we provide a general framework that allows us to extend this
work further in two ways, with covariates and multiple sources. This is done by
generalising the standard log - linear modelling approach used in multiple -
recapture estimation such that it incorporates linkage error correction. This
leads to the weighted multiple – recapture (WMR) model and is discussed in
section 2. In section 3 we show the results of a simulation study that tests the
WMR model.
2. Methodology
We first introduce some formal notation. S defines the source, where in
standard CR = (1,2) and in MR = (1,2, ... ). Next, we define the linked ‘register’
−1 as:
0= 1
( , )
1= 1 1 2
( , ) ,
−1 = 2= 2 1 3
⋮
{ −1= −2 , )
(
where refers to a set of t + 1 sequentially linked sources and refers
to the linkage process that links −1 with +1 . In case of CR this reduces
to = = ( , ). The true cell counts, estimated cell counts and
2
1
1
observed cell counts (i.e. the counts of records that are linked and not
linked between −1 and +1 ) are denoted as = ( , , ), ̂ =
10
01
11
(̂ 11 , ̂ 10 , ̂ 01 ) and = ( , , 01 ) respectively. Here i ∈ {1,0}
10
11
corresponds to records in and not in −1 and j ∈ {1,0} corresponds to
records in and not in +1 . When there are no linkage errors, the true cell
counts are equal to the observed cell counts, i.e. = . Furthermore,
we define = ( , , ) and = ( , , ) as the true and
∗
∗
∗
∗
∗
∗
∗
∗
01
01
11
10
11
10
observed cell counts in a random sample from −1 called a rematch or audit
study (for a discussion on the difference between rematch and audit sample,
which is small, we refer to Zult et al. (2019)). Beside that refers to a
∗
subsample, the difference between and is that in the presence of
∗
linkage errors is assumed to be known while is not. Finally, we
∗
introduce = 1, … , which are the records in . Under perfect linkage
this implies that all records refer to unique units/individuals, but in case of
linkage errors two records in might belong to different units/individuals
or one record in might represent two or more units.
The derivation of the WMR model follows three steps. First the D&F model
is written as log – linear Poisson regression model. Second, the dependent
variable in this model is corrected for linkage errors in case of two sources but
325 | I S I W S C 2 0 1 9