Page 24 - Contributed Paper Session (CPS) - Volume 7
P. 24

CPS2020 Honeylet T. S.
                  2.1.  Matching Methods
                  Poisson regression imputation
                    1)  Fit  separate  Poisson  regression  models  to  Data  Source  A  and  Data
                        Source B with specific variables as response and common variable as
                        predictor.
                    2)  Impute missing values in Data Source A and Data Source B based on
                        models in Step 1.
                    3)  Concatenate Data Source A and Data Source B to form synthetic dataset
                        with X, Y, and Z  values.

                  Random Hot Deck imputation
                    1)  Randomly choose an observation from Data Source B (donor file) for
                        each observation in Data Source A (recipient file).
                    2)  Impute missing values in Data Source A based on values of  matched
                        observations from Data Source B. Completed Data Source A will serve as
                        synthetic dataset.

                  Markov chain Monte Carlo imputation
                    1)  Fit a model to Data Source A with Y as response using a random walk
                        Metropolis algorithm with posterior distribution of a Poisson regression
                        model and improper uniform prior for the coefficient. The starting value
                        of  the  coefficient  used  in  the  algorithm  is  its  maximum  likelihood
                        estimate. The number of burn-in iterations is 1,000 and the number of
                        Metropolis iterations is 10,000. Similarly, fit a model to Data Source B
                        with Z as response.
                    2)  Impute  missing  Z  values  in  Data  Source  A  using  the  model  of  Data
                        Source B in Step 1.  Impute missing Y values in Data Source B using the
                        model of Data Source A in Step 1.
                    3)  Concatenate Data Source A and Data Source B to form synthetic dataset
                        with X, Y, and Z values.

                      In summary, Poisson regression imputation will be used to predict missing
                  values  in  the  data  sources  by  fitting  corresponding  Poisson  models,  while
                  random hot deck procedure will not require specification of a model. It will
                  impute missing values by simply matching a random observation from donor
                  file  Data  Source  B  to  each  observation  in  recipient  file  Data  Source  A.
                  Alternatively, MCMC imputation will be used to get the model of the data
                  sources  with  corresponding  specific  variables  as  responses  and  common
                  variables as predictors. It involves simulation from a posterior distribution of
                  Poisson regression models. Missing values of Y and Z will then be predicted
                  based on these models.


                                                                      13 | I S I   W S C   2 0 1 9
   19   20   21   22   23   24   25   26   27   28   29