Page 22 - Contributed Paper Session (CPS) - Volume 7
P. 22
CPS2020 Honeylet T. S.
Statistical matching for modeling of count data
Honeylet T. Santos
School of Statistics, University of the Philippines, Diliman, Quezon City
Valenzuela City, Philippines
Abstract
Statistical matching deals with methods of combining different data from
different sources to get information on variables not observed in a single
source. With the goal of estimating a Poisson regression model, this study
explores statistical matching techniques and estimation procedures involving
bootstrap. Simulation studies confirmed that Poisson regression imputation
and MCMC imputation produce comparable results. It also showed that
bootstrap within method performs well regardless of the matching method
used.
Keywords
Imputation; Bootstrap; MCMC
1. Introduction
Recent technological development has created new sources and ways of
harnessing data. Vast data sources are now available. Despite the proliferation
of new and traditional data sources, there remains difficulties in utilizing them.
A researcher may want to model a variable from Data Source A using a variable
from Data Source B. This poses a problem because the information required
are not contained in the same data source. There is thus a need to explore
how these data sources can be combined through statistical matching. In
D’Orazio et al. (2006), it is discussed that statistical matching deals with the
problem of combining sources of information under the assumptions that (a)
common variables are observed in different data sources and (b) observations
from different data sources do not overlap.
Literature on statistical matching methods, for instance D’Orazio et al.
(2006), often focuses on continuous and categorical variables. Hence, this
study will focus on statistical matching of count data because count data can
be found in many data sources.
Poisson is often assumed to be the distribution of count response variable
in a generalized linear model (GLM). In Agresti (2013), Poisson loglinear
models use the canonical link of a Poisson
GLM which is the log link. The Poisson loglinear model can be represented
as log = + + ⋯ + . (1.1)
1 1
where is the mean of the response, is the intercept, . is the coefficient,
and . is the explanatory variable for = 1, … , .
11 | I S I W S C 2 0 1 9