Page 39 - Special Topic Session (STS) - Volume 1
P. 39

STS346 Abu Sayed M. et al.
                                             ˆ
                                                       ˆ
                                                       X
                                   ~
                                                 w   1    |  X  Med  |
                                               i
                                                         i
                                    ii                
                                                      ˆ
                                        n     nMAD   X i
                                                         ~
            It is easy to show that mean ( w ˆ ) = median ( w ) = 2/n.
                                                          ii
                                           ii
               We consider several measures of the identification of high leverage points,
            the  twice  the  mean  (2M)  rule,  the  thrice-the-mean  (3M)  rule,  and  then
            introduce a new cut-off point. Since it may not be easy to find the theoretical
                             ~
            distribution  of  w  and  often  excessive  high  leverage  values  can  affect
                              ii
            measures like mean and standard deviation, we define a confidence bound
            type cut-off point
                                                                 ~
                                    ~
                                                   ~
                                                    w  > Median( w ) + 3 MAD( w )
                                                    ii
                                      ii
                                                                  ii
            which is analogous to forms used by Hadi (1992), Imon (2002,2005) and others.
            In this paper, we consider five identification rules which are listed below:
            Rule 1 (Classical 2M):  w ˆ  > 4/n
                                   ii
            Rule 2 (Classical 3M):  w ˆ  > 6/n
                                   ii
                                                ~
            Rule 3 (New 2M based on Median):  w  > 4/n
                                                 ii
                                                ~
            Rule 4 (New 3M based on Median):  w  > 6/n
                                                 ii
                                                                                  ~
                                                      ~
                                                                    ~
            Rule 5 (New Median based Cut-off point):  w  > Median( w ) + 3 MAD( w )
                                                                                    ii
                                                                      ii
                                                       ii
            We  compare  the  performances  of  the  above  rules  in  terms  of  correct
            identification of high leverage points and swamping rate of good leverages.

            3.  Results
               We  consider  a  real  world  data  to  investigate  the  performance  of  our
            proposed method. In order to make the relationship as model (2), we assume
            that measurement error can occur in either variable of these two examples.
            The data is taken from Hand et al. (1994) where the data for 50 results of iron
            content of crushed blast furnace slag measured by two different techniques,
            which are chemical test (Y) and magnetic test (X). The X values are estimated
            by the maximum likelihood formula (17). Now we compute the leverage values
            for this data set. Here the cut-off point for rule 1 and 3 is 0.08, for rule 2 and
            4 is 0.12 and rule 5 is 0.0975 respectively. We observed that the traditional
            leverage values  w ˆ  do not identify any high leverage points, but the 2M rule
                              ii
                                                                             ~
            swamps in 6 good cases. The newly proposed leverage measures  w  do not
                                                                               ii
            identify any high leverage points but the 2M rule swamps in 1 good case. The
            3M  rule  does  not  identify  any  high  leverage  point  for  both  of  these  two
            leverage measures. We observe exactly the same performance from the rule
            based on the new cut-off point as well. Now we modified the original iron in
            slag data by inserting few high leverage points. We consider three different


                                                                28 | I S I   W S C   2 0 1 9
   34   35   36   37   38   39   40   41   42   43   44