Page 369 - Contributed Paper Session (CPS) - Volume 6
P. 369

CPS1969 Janna M. De Veyra
                 proposed procedures that is greater than the critical value at a 0.05 level
                 of significance divided by the total number of replicates.

            2.4 Simulation Design
                A  simulation  study  was  conducted  to  assess  the  efficiency  of  the
            procedures  mentioned  in  the  previous  part  of  the  paper  in  matching
            categorical variables. The simulation will be done by generating a data source
            where all 1, 2, y, and z variables are available. The y and z variable will be
            generated from a logistic regression
                              y: logit[Π( )] =  +   +   + 
                             {                   1 1   2 2   
                               z: logit[Π( )] =  +   +   +  
                                                             2 2
                                                     1 1
            where   and   are the covariates that are either categorical or continuous
            depending on the scenario and are generated by either assigning a known
            probability in each category or from a normal distribution with mean µ and
            variance   and  ε  is  a  random  residual  that  is  generated  from  a  normal
            distribution with mean 0 and variance   . However, in some scenarios where
            the covariates are continuous,   as a function of   was considered. When the
            variables of interest are binary, the assigned value for y and z will be based on
            this cut-off
                                               1  Π() ≥ 0.5
                                      y or z = {
                                               0  Π() < 0.5
            when the variables of interest have 3 categories, the assigned value for y and
            z will be based on this cut-off
                                             1  Π() [0.67,1]
                                   y or z = {2  Π() [0.34,0.67]
                                                3 ℎ.
            when the variables of interest have 5 categories, the assigned value for y and
            z will be based on this cut-off
                                             1  Π() [0.80,1]

                                            2  Π() [0.60,0.80]
                                  y or z = 3  Π() [0.40,0.60]

                                            4  Π() [0.20,0.40]
                                          {     5 ℎ.
            Simulation consists of cases when y and z are independent and when y and z
            are  dependent  according  to  the  chi-square  statistics.  This  is  based  on  the
            values set on , , 1, 1, 2, 2,  and . The simulated data will serve as
            a benchmark in assessing the performance of the procedures in the test for
                Sample  size  used  in  this  study  is  1000.  Source  A  and  source  B  will  be
            generated by splitting the simulated data into two and deleting variable z in
            one data and variable y in another data. Source A will be the data that contains
            1, 2, and y variable while source B will be the data that contains 1, 2, and z

                                                               358 | I S I   W S C   2 0 1 9
   364   365   366   367   368   369   370   371   372   373   374