Page 366 - Contributed Paper Session (CPS) - Volume 6
P. 366
CPS1969 Janna M. De Veyra
Testing for independence on statistically
matched categorical variables
Janna M. De Veyra
University of the Philippines Diliman
Abstract
In most instances, conducting a new survey is impossible due to time
constraints and limited resources. Matching data sources has been used as a
way to obtain a data set where all the intended variables are available. This
paper proposes the use of the MCMC and the inclusion of random error in
matching categorical variables as well as the application of bootstrap
procedure in testing for their independence. A simulation study indicates that
the test is most effective when the proposed procedures are all applied
because combining all these procedures produces a correctly sized test that
yields the highest power among all other proposed procedures combined.
Keywords
random error; mcmc; bootstrap; size; power
1. Introduction
Orazio, et.al (2006) defines statistical matching as a statistical procedure
that aims to integrate two or more datasets characterized by the fact that the
different datasets contain information on a set of common variables and
variables that are not jointly observed and that the units observed in the data
sets are different. The goal of this procedure is to derive a synthetic data and
to estimate the joint distribution of the variables that are not jointly observed
in a single data set. The need for this type of procedure increases when the
chance of conducting a new survey is almost impossible in a given time frame
and resources. This paper deals with matching procedures in the categorical
data to test for the independence of the two variables that are not jointly
observed. Seltman (2015) mentioned that the usual statistical test in the case
of categorical outcome and a categorical explanatory variable is whether or
not the two variables are independent. Matching procedures used were
regression imputation, stochastic imputation, and an application of MCMC in
those two imputations. A check for independence on the four imputation
procedures will be made using the Chi-square statistics. An application of
bootstrap method under the four imputation procedures will also be
considered to identify if this will produce a more reliable result in the test for
independence.
355 | I S I W S C 2 0 1 9