Page 76 - Special Topic Session (STS) - Volume 4
P. 76
STS563 Davide Di Cecco et al.
Population size estimation from incomplete
multisource lists:
A Bayesian perspective on latent class modelling
1
1
2
Davide Di Cecco , Marco Di Zio , Brunero Liseo
1 ISTAT, via Cesare Balbo, 16, 00184 Rome
2 MEMOTEF, Sapienza Rome University, viale del castro laurenziano 9, 00161 Rome
Abstract
We propose a capture–recapture model for estimating the size of a population
of interest based on a set of administrative sources and/or surveys in the
presence of out-of-scope units (false captures). Our Bayesian approach makes
use of a certain class of log - linear models with a latent structure. We also
address the presence of sources providing partial information implementing
a Gibbs Sampler algorithm which generates from the posterior distribution of
the population size in presence of missing data. The proposed method is
applied to simulated data sets.
Keywords
Bayesian Analysis, Capture–Recapture, Latent Class
1. Introduction
The use of administrative data for the production of official statistics is
providing many new opportunities and methodological challenges. In
estimating the size of the usual resident population by municipality, in almost
all national statistics institutes the use of traditional censuses is gradually
being replaced with the use of administrative sources, which provide “signs of
life” for the population of interest. While undercoverage was the main issue in
the former approach, overcoverage is the main concern with administrative
data. By overcoverage we mean the erroneous inclusion in the lists of units
which do not belong to our population, i.e., out-of-scope units. Of course,
overcoverage can be encountered in surveys and census too, but almost
always it consists of duplicated records generated by linkage errors, which are
now commonly addressed even in capture–recapture contexts. In
administrative data, on the other hand, linkage errors constitute just one of
the factors, in a number of possible reasons for erroneous captures. In general,
administrative data are gathered by other organizations for non-statistical
purposes. Hence, units and variable definitions may not align perfectly. For
example, the available information pertaining the registered events, their
temporal description, their legal definition may vary in each source, and their
harmonization can be difficult. As a consequence, each list may contain
different subpopulations of out-of-scope units, and the assignment of the
units to our target population may not be error free. Obviously, any piece of
65 | I S I W S C 2 0 1 9