Page 76 - Special Topic Session (STS) - Volume 4
P. 76

STS563 Davide Di Cecco et al.

                                 Population size estimation from incomplete
                                               multisource lists:
                              A Bayesian perspective on latent class modelling
                                                               1
                                                1
                                                                              2
                                Davide Di Cecco , Marco Di Zio , Brunero Liseo
                                       1  ISTAT, via Cesare Balbo, 16, 00184 Rome
                       2  MEMOTEF, Sapienza Rome University, viale del castro laurenziano 9, 00161 Rome

                  Abstract
                  We propose a capture–recapture model for estimating the size of a population
                  of  interest  based  on  a  set  of  administrative  sources  and/or  surveys  in  the
                  presence of out-of-scope units (false captures). Our Bayesian approach makes
                  use of a certain class of log - linear models with a latent structure. We also
                  address the presence of sources providing partial information implementing
                  a Gibbs Sampler algorithm which generates from the posterior distribution of
                  the  population  size  in  presence  of  missing  data.  The  proposed  method  is
                  applied to simulated data sets.

                  Keywords
                  Bayesian Analysis, Capture–Recapture, Latent Class

                  1.  Introduction
                      The use of administrative data for the production of official statistics is
                  providing  many  new  opportunities  and  methodological  challenges.  In
                  estimating the size of the usual resident population by municipality, in almost
                  all  national  statistics  institutes  the  use  of  traditional  censuses  is  gradually
                  being replaced with the use of administrative sources, which provide “signs of
                  life” for the population of interest. While undercoverage was the main issue in
                  the former approach, overcoverage is the main concern with administrative
                  data. By overcoverage we mean the erroneous inclusion in the lists of units
                  which do not belong to our population, i.e., out-of-scope units. Of course,
                  overcoverage  can  be  encountered  in  surveys  and  census  too,  but  almost
                  always it consists of duplicated records generated by linkage errors, which are
                  now  commonly  addressed  even  in  capture–recapture  contexts.  In
                  administrative data, on the other hand, linkage errors constitute just one of
                  the factors, in a number of possible reasons for erroneous captures. In general,
                  administrative  data  are  gathered  by  other  organizations  for  non-statistical
                  purposes. Hence, units and variable definitions may not align perfectly. For
                  example,  the  available  information  pertaining  the  registered  events,  their
                  temporal description, their legal definition may vary in each source, and their
                  harmonization  can  be  difficult.  As  a  consequence,  each  list  may  contain
                  different  subpopulations  of  out-of-scope  units,  and  the  assignment  of  the
                  units to our target population may not be error free. Obviously, any piece of


                                                                      65 | I S I   W S C   2 0 1 9
   71   72   73   74   75   76   77   78   79   80   81