Page 34 - Special Topic Session (STS) - Volume 1
P. 34

STS346 Abu Sayed M. et al.



                                Identification of high leverage points in linear
                                  functional relationship model for Big Data
                             Abu Sayed Md. Al Mamun , A.H.M. Rahmatullah Imon
                                                      1
                                                                                 2
                                                       3
                                                                            4
                                  Abdul Ghapor Hussin , Yong Zulina Zubairi
                                          1 University of Rajshahi, Bangladesh
                                          2 Ball State University, Muncie, USA
                                       3 National Defence University of Malaysia
                                           4 University of Malaya, Malaysia

                  Abstract
                  Linear functional relationship is having wider applications in statistics because
                  explanatory variables with measurement error are more prevalent in real life
                  problems.  So  there  is  a  greater  scope  that  unusual  errors  (outliers)  could
                  generate  unusual  observations  in  the  X-space  called  high  leverage  points.
                  High  leverage  points  often  exert  too  much  influence  and  consequently
                  become responsible for misleading conclusion about the fitting of a regression
                  model,  causing  multicollinearity  problems,  masking  and/or  swamping  of
                  outliers  etc.  Although  a  good  number  of  literature  are  available  on  the
                  identification of high leverage points in linear regression model, but this is still
                  an unsolved issue in linear functional relationship model. In this paper, we
                  suggest a procedure for the identification of high leverage points based on
                  group deletion. The usefulness of the proposed method for the detection of
                  multiple high leverage points is studied by some well-known data sets and
                  Monte Carlo simulations. Since our statistic is based on median and median
                  absolute deviation instead of mean and standard deviation respectively it is
                  computationally less extensive and more suitable for big data.

                  Keywords
                  Leverages; Masking; Swamping

                  1.  Introduction
                      The linear functional relationship model (LFRM) is an extension of a linear
                  regression  model  (LRM)  which  allows  for  sampling  variability  in  the
                  measurements of both the response and explanatory variables. In regression
                  the model is poorly fitted because of the presence of outliers. It is a common
                  practice  over  the  years  to  use  residuals  for  the  identification  of  outliers.
                  Residuals are in fact estimates of the true errors that occur in the Y-space. We
                  anticipate  at  this  point  that  fitting  of  the  LFRM  could  be  even  more
                  complicated because here outliers could occur in the X-space more frequently
                  than  the  linear  regression  model.  Outliers  in  the  X-space  are  called  high
                  leverage points in the regression literature since they exert too much weight


                                                                      23 | I S I   W S C   2 0 1 9
   29   30   31   32   33   34   35   36   37   38   39