Page 24 - Contributed Paper Session (CPS) - Volume 7
P. 24
CPS2020 Honeylet T. S.
2.1. Matching Methods
Poisson regression imputation
1) Fit separate Poisson regression models to Data Source A and Data
Source B with specific variables as response and common variable as
predictor.
2) Impute missing values in Data Source A and Data Source B based on
models in Step 1.
3) Concatenate Data Source A and Data Source B to form synthetic dataset
with X, Y, and Z values.
Random Hot Deck imputation
1) Randomly choose an observation from Data Source B (donor file) for
each observation in Data Source A (recipient file).
2) Impute missing values in Data Source A based on values of matched
observations from Data Source B. Completed Data Source A will serve as
synthetic dataset.
Markov chain Monte Carlo imputation
1) Fit a model to Data Source A with Y as response using a random walk
Metropolis algorithm with posterior distribution of a Poisson regression
model and improper uniform prior for the coefficient. The starting value
of the coefficient used in the algorithm is its maximum likelihood
estimate. The number of burn-in iterations is 1,000 and the number of
Metropolis iterations is 10,000. Similarly, fit a model to Data Source B
with Z as response.
2) Impute missing Z values in Data Source A using the model of Data
Source B in Step 1. Impute missing Y values in Data Source B using the
model of Data Source A in Step 1.
3) Concatenate Data Source A and Data Source B to form synthetic dataset
with X, Y, and Z values.
In summary, Poisson regression imputation will be used to predict missing
values in the data sources by fitting corresponding Poisson models, while
random hot deck procedure will not require specification of a model. It will
impute missing values by simply matching a random observation from donor
file Data Source B to each observation in recipient file Data Source A.
Alternatively, MCMC imputation will be used to get the model of the data
sources with corresponding specific variables as responses and common
variables as predictors. It involves simulation from a posterior distribution of
Poisson regression models. Missing values of Y and Z will then be predicted
based on these models.
13 | I S I W S C 2 0 1 9