Page 22 - Contributed Paper Session (CPS) - Volume 7
P. 22

CPS2020 Honeylet T. S.





                               Statistical matching for modeling of count data
                                              Honeylet T. Santos
                            School of Statistics, University of the Philippines, Diliman, Quezon City
                                             Valenzuela City, Philippines
                  Abstract
                  Statistical  matching  deals  with  methods  of  combining  different  data  from
                  different  sources  to  get  information  on  variables  not  observed  in  a  single
                  source. With the goal of estimating a Poisson regression model, this study
                  explores statistical matching techniques and estimation procedures involving
                  bootstrap. Simulation studies confirmed that Poisson regression imputation
                  and  MCMC  imputation  produce  comparable  results.  It  also  showed  that
                  bootstrap within method performs well regardless of the matching method
                  used.

                  Keywords
                  Imputation; Bootstrap; MCMC

                  1.  Introduction
                      Recent technological development has created new sources and ways of
                  harnessing data. Vast data sources are now available. Despite the proliferation
                  of new and traditional data sources, there remains difficulties in utilizing them.
                  A researcher may want to model a variable from Data Source A using a variable
                  from Data Source B. This poses a problem because the information required
                  are not contained in the same data source. There is thus a need to explore
                  how  these  data  sources  can  be  combined  through  statistical  matching.  In
                  D’Orazio et al. (2006), it is discussed that statistical matching deals with the
                  problem of combining sources of information under the assumptions that (a)
                  common variables are observed in different data sources and (b) observations
                  from different data sources do not overlap.
                      Literature  on  statistical  matching  methods,  for  instance  D’Orazio  et  al.
                  (2006),  often  focuses  on  continuous  and  categorical  variables.  Hence,  this
                  study will focus on statistical matching of count data because count data can
                  be found in many data sources.
                      Poisson is often assumed to be the distribution of count response variable
                  in  a  generalized  linear  model  (GLM).  In  Agresti  (2013),  Poisson  loglinear
                  models use the canonical link of a Poisson
                      GLM which is the log link. The Poisson loglinear model can be represented
                  as      log  =  +   + ⋯ +  .                                  (1.1)
                                                  
                                      1 1
                  where  is the mean of the response,  is the intercept, . is the coefficient,
                  and . is the explanatory variable for  = 1, … , .
                                                                      11 | I S I   W S C   2 0 1 9
   17   18   19   20   21   22   23   24   25   26   27