Page 23 - Contributed Paper Session (CPS) - Volume 7
P. 23

CPS2020 Honeylet T. S.
                Aside from Poisson regression via Maximum Likelihood Estimation (MLE),
            bootstrap  methods  will  also  be  considered  in  the  estimation  of  model
            parameters.  Efron (2000)  argued  that  an  advantage  of  bootstrapping  is  its
            broad application. Moreover, Efron and Tibshirani (1986) discussed that the
            bootstrap methods can be used in measuring statistical accuracy of estimators
            even with more complicated forms. They showed in a sampling experiment
            that bootstrap estimates for standard error of correlation coefficient, which
            does not have a simple form, are nearly unbiased.
                It is possible that a true model involves a count response variable predicted
            by variables 1 and 2 that either have high correlation or low correlation.
            However, a predicament wherein only one of the  variables was observed in
            the data sources may occur in practice. Hence, simulations in this study will
            focus on cases in which only one of the common  variables is available in the
            data  sources.  Furthermore,  other  considerations  that  might  take  place  in
            practice will be taken into account.
                These include total sample size of concatenated data sources, ratio of data
            sources to total sample size, and effect of 1 and 2 on count variables to be
            matched.
                 The  objective  of  this  study  is  to  combine  data  from  different  sources
            through statistical matching techniques with the end goal of developing a
            count regression model. Specifically, this study aims to: (1) develop a statistical
            matching  technique  to  create  synthetic  count  data,  (2)  estimate  count
            regression models based on synthetic data, and (3) characterize the estimation
            procedure through simulation studies.
                 The matching and estimation procedure will be evaluated using absolute
            relative bias (RBIAS) and mean absolute error (MAE). RBIAS will measure the
            accuracy  of  the  estimates  obtained  while  MAE  will  measure  the  predictive
            ability of the estimated model.

            2.  Methodology
                Statistical matching problem involves integrating different data sources to
            create a synthetic dataset. For the purpose of this study, two independent data
            sources  –  Data  Source  A  with  A  observations  and  Data  Source  B  with  B
            observations – will be considered. Common variable  is observed in both data
            sources  while  specific  variable    is  missing  in  Data  Source  A  and  specific
            variable  is missing in Data Source B. The data sources are random samples
            from  the  same  population.  The  assumption  is  that  combining  these  two
            independent data sources will yield a larger random sample Data Source A ∪
            B with  = A + B observations from the same population. Consequently, the
            observation units in data Source A and data Source B are disjoint.



                                                                12 | I S I   W S C   2 0 1 9
   18   19   20   21   22   23   24   25   26   27   28