Page 337 - Special Topic Session (STS)

Page 337 - Special Topic Session (STS) - Volume 3

P. 337

STS547 Daan Zult et al.
with covariates. These two steps are discussed in section 2.1. Third, this model
is extended towards multiple – sources, which is discussed in section 2.2.
2.1 Capture - recapture estimation and linkage error correction
In the most basic case of CR the PSE is given by the standard Petersen (Petersen,
1986, Lincoln, 1930) formula:
̂
= 11 + 10 + 01 + 10 01 = ( 11 + 10 )(( 11 + 01 ) = 1+ +1 (1),
11 11 11
̂
where under the appropriate assumptions is an unbiased estimate
of the true population size (Wolter, 1986). The Petersen estimator is closely
related to a fitted value obtained from a log - linear Poisson regression model
with cell counts data (e.g. see Cormack, 1989), i.e.:

[ ] = ( 0 + 1 + 2 ) for i, j ∈ {1,0} (2),

where serves as the dependent variable in the log - linear

regression model. The Poisson regression model uses maximum likelihood
to obtain estimates , , . An important difference between equation (1)
̂
̂
̂
0
2
1
and (2) is that (2) can be easily extended with additional sources or categorical
covariates.
When the appropriate assumptions are not met, for instance records are
̂
not perfectly linked, is biased. Therefore D&F developed a linkage
error correction method that uses a rematch study from which they calculate
the linkage error probabilities that are used to correct the PSE for linkage
errors. DW show that this correction method can be written as:
̂
& = 1+ +1 (3),
̂ 11
where ̂ is the estimated number of links between both sources that takes
11
linkage errors into account. Combining equation (1), (2) and (3) allows us to
write:

[̂ ] = ( 0 + 1 + 2 ) for i, j ∈ {1,0} (4)

∗
where ̂ 11 = 11 11 , ̂ 10 = 1+ − ̂ and ̂ 01 = +1 − ̂ 11 (see Zult et
11
∗

11
al. (2019) for a more extensive derivation). In words, equation (4) constitutes
the same model as equation (2), except the dependent variable is

replaced by ̂ , where ̂ is simply a vector of estimated cell counts that

is based on the results of the audit study. Here we should note that the
calculation of ̂ is independent of the exact linkage procedure . In fact,
11
∗
the only thing that matters is that the fraction 11 is a consistent estimate
∗ 11
of 11 , which implies that the audit study should be representative for .
11

326 | I S I W S C 2 0 1 9

332 333 334 335 336 337 338 339 340 341 342