Page 246 - Special Topic Session (STS) - Volume 1
P. 246

STS426 Asis K.C.
                  significant in the data application of imputation techniques may result in loss
                  of accuracy and the severity of error increases with the increased proportion
                  of missing observations. To tackle this situation, they proposed this alternative
                  classification rule by including the proportion of missing observation in the
                  construction of classification rule.
                      For the redefined loss due to misclassification, the classification rule with
                  prior  probabilities   and   would  be  to  classify  an  observation (, ) to
                                      1
                                              2
                  group 1 if
                               log    1 (,)  > log    2 (1|2)  = log    2  1 (1− 1 ) ………Rule 2
                                    2 (,)
                                                1 (1|2)
                                                              1  2 (1− 2 )

                  else classify to group 2.
                  For Rule 2 the total cost of misclassification is less than that for Rule 1.
                      In order to compare the performances of the rules on the basis of above
                  mentioned astronomical data, the following exercise had been carried out.
                      As before, for the NGC 5128 data set, they have first performed a k-means
                  clustering with k=2 and formed two clusters( 55 observations in cluster 1 and
                  72 observations in cluster 2).Here the only difference was at each step they
                  have created few missing observations in the data.
                  Then they performed the classification in two different ways:
                  1.  Rule 1a: The missing observations were discarded and Rule 1 was used
                      treating the resulting data set as a whole
                  2.  Rule 2a:  Rule 2 was used considering the missing proportions.
                     As earlier, they considered the three candidate distributions and found the
                  concerned TCM considering different choices of ( ,  ) and ( ,  ). Rule 1a
                                                                      2
                                                                    1
                                                                                   2
                                                                               1
                  and Rule 2a are essentially the modified versions of Rule 1, adjusted for the
                  presence of missing observations. The resulting analysis is shown in Table-2
                  given below.

                                   Table 2: TCM for NGC 5128 data with missing values
                   Rules     Gamma            TCM  (  =0.9,   =0.3)  TCM  (  =0.2,   =0.8)
                                                              2
                                                      1
                                                                             1
                                                                                     2
                             distribution     (  = 0.06,   = 0.15)   ( = 0.06,   = 0.15)
                                                                                  2
                                                                        1
                                                          2
                                                1
                   Rule 1a   First form       0.170                   0.045
                             Second form      0.050                   0.019
                             Third form       0.029                   0.004
                   Rule 2a   First form       0.097                   0.104
                             Second form      0.039                   0.022
                             Third form       0.013                   0.009

                     The  important  findings  from  the  above  tables  are,  the  total  cost  of
                  misclassification  for  second  and  third  form  is  less  than  the  total  cost  of


                                                                     235 | I S I   W S C   2 0 1 9
   241   242   243   244   245   246   247   248   249   250   251