Page 34 - Special Topic Session (STS)

Page 34 - Special Topic Session (STS) - Volume 1

P. 34

STS346 Abu Sayed M. et al.

Identification of high leverage points in linear
functional relationship model for Big Data
Abu Sayed Md. Al Mamun , A.H.M. Rahmatullah Imon
1
2
3
4
Abdul Ghapor Hussin , Yong Zulina Zubairi
1 University of Rajshahi, Bangladesh
2 Ball State University, Muncie, USA
3 National Defence University of Malaysia
4 University of Malaya, Malaysia

Abstract
Linear functional relationship is having wider applications in statistics because
explanatory variables with measurement error are more prevalent in real life
problems. So there is a greater scope that unusual errors (outliers) could
generate unusual observations in the X-space called high leverage points.
High leverage points often exert too much influence and consequently
become responsible for misleading conclusion about the fitting of a regression
model, causing multicollinearity problems, masking and/or swamping of
outliers etc. Although a good number of literature are available on the
identification of high leverage points in linear regression model, but this is still
an unsolved issue in linear functional relationship model. In this paper, we
suggest a procedure for the identification of high leverage points based on
group deletion. The usefulness of the proposed method for the detection of
multiple high leverage points is studied by some well-known data sets and
Monte Carlo simulations. Since our statistic is based on median and median
absolute deviation instead of mean and standard deviation respectively it is
computationally less extensive and more suitable for big data.

Keywords
Leverages; Masking; Swamping

1. Introduction
The linear functional relationship model (LFRM) is an extension of a linear
regression model (LRM) which allows for sampling variability in the
measurements of both the response and explanatory variables. In regression
the model is poorly fitted because of the presence of outliers. It is a common
practice over the years to use residuals for the identification of outliers.
Residuals are in fact estimates of the true errors that occur in the Y-space. We
anticipate at this point that fitting of the LFRM could be even more
complicated because here outliers could occur in the X-space more frequently
than the linear regression model. Outliers in the X-space are called high
leverage points in the regression literature since they exert too much weight

23 | I S I W S C 2 0 1 9

29 30 31 32 33 34 35 36 37 38 39