Page 269 - Contributed Paper Session (CPS) - Volume 4
P. 269
CPS2222 Abdullah M.R. et al.
Nu-Support Vector Regression for the
identification of outliers in High Dimensional
Data
1
1
Abdullah Mohammed Rashid , Habshah Midi , Waleed Dhhan 2&3 ,
1
Jayanthi Arasan
1 Institute for Mathematical Research, University Putra Malaysia
2 Babylon Municipalities, Ministry of Construction, Housing, Municipalities and Public Works,
Babylon, Iraq.
3 Scientific Research Centre, Nawroz University (NZU), Duhok, Iraq.
Abstract
High-dimensional data (HDD) refer to the situation where the number of
unknown parameters which are to be estimated is one or several orders of
magnitude larger than the number of samples in the data. As High-
dimensional data occur as a rule rather an exception in areas like information
technology, bioinformatics or astronomy, it is imperative to use efficient
technique of modelling and analyzing such data to avoid misleading
conclusion. Analyzing such data encounter many challenges and outliers turn
out to be the major challenge when dealing with this data. It is important to
detect outliers because they have on adverse effect on the values of the
various estimates, which lead to a misleading conclusion. Several parametric
and non-parametric methods have been developed to detect outliers, but
these methods cannot deal with high dimensional data (HDD). The fixed
parameters support vector regression (FP-SVR) is put forward to remedy this
problem. Nonetheless, the FP-SVR which employs Eps-SVR is not very
successful in the identification of mild outliers and other contamination
scenarios. To remedy this problem, we propose to use Nu-SVR to detect
extreme and mild outliers. The merit of our proposed method is confirmed by
well-known examples and simulation study.
Keywords
Outliers; Robustness; Statistical Learning Theory; Support Vector Regression.
1. Introduction
Outliers come back terribly ofttimes in the real information set, and that
they usually go unmarked. There are many forms of outliers in regression
issues. Any observation that has a large residual is referred to as residuals
outlier. Observations that are extreme in the y-direction are known as y-
outliers or vertical outliers and that they are answerable for model failure. High
leverage points are those observations that are far in X-direction, and that they
also are moving the regression model. Habshah et al. (2009) highlighted that
it's crucial to detect multiple high leverage points as they're answerable for
258 | I S I W S C 2 0 1 9