Page 369 - Contributed Paper Session (CPS) - Volume 6
P. 369
CPS1969 Janna M. De Veyra
proposed procedures that is greater than the critical value at a 0.05 level
of significance divided by the total number of replicates.
2.4 Simulation Design
A simulation study was conducted to assess the efficiency of the
procedures mentioned in the previous part of the paper in matching
categorical variables. The simulation will be done by generating a data source
where all 1, 2, y, and z variables are available. The y and z variable will be
generated from a logistic regression
y: logit[Π( )] = + + +
{ 1 1 2 2
z: logit[Π( )] = + + +
2 2
1 1
where and are the covariates that are either categorical or continuous
1
2
depending on the scenario and are generated by either assigning a known
probability in each category or from a normal distribution with mean µ and
variance and ε is a random residual that is generated from a normal
2
2
distribution with mean 0 and variance . However, in some scenarios where
the covariates are continuous, as a function of was considered. When the
2
1
variables of interest are binary, the assigned value for y and z will be based on
this cut-off
1 Π() ≥ 0.5
y or z = {
0 Π() < 0.5
when the variables of interest have 3 categories, the assigned value for y and
z will be based on this cut-off
1 Π() [0.67,1]
y or z = {2 Π() [0.34,0.67]
3 ℎ.
when the variables of interest have 5 categories, the assigned value for y and
z will be based on this cut-off
1 Π() [0.80,1]
2 Π() [0.60,0.80]
y or z = 3 Π() [0.40,0.60]
4 Π() [0.20,0.40]
{ 5 ℎ.
Simulation consists of cases when y and z are independent and when y and z
are dependent according to the chi-square statistics. This is based on the
values set on , , 1, 1, 2, 2, and . The simulated data will serve as
a benchmark in assessing the performance of the procedures in the test for
independence.
Sample size used in this study is 1000. Source A and source B will be
generated by splitting the simulated data into two and deleting variable z in
one data and variable y in another data. Source A will be the data that contains
1, 2, and y variable while source B will be the data that contains 1, 2, and z
358 | I S I W S C 2 0 1 9