Page 269 - Contributed Paper Session (CPS) - Volume 4
P. 269

CPS2222 Abdullah M.R. et al.

                              Nu-Support Vector Regression for the
                          identification of outliers in High Dimensional
                                               Data
                                                              1
                                               1
                  Abdullah Mohammed Rashid , Habshah Midi , Waleed Dhhan      2&3 ,
                                                         1
                                          Jayanthi Arasan
                         1 Institute for Mathematical Research, University Putra Malaysia
             2 Babylon Municipalities, Ministry of Construction, Housing, Municipalities and Public Works,
                                            Babylon, Iraq.
                        3 Scientific Research Centre, Nawroz University (NZU), Duhok, Iraq.

            Abstract
            High-dimensional  data  (HDD)  refer  to  the  situation  where  the  number  of
            unknown parameters which are to be estimated is one or several orders of
            magnitude  larger  than  the  number  of  samples  in  the  data.  As  High-
            dimensional data occur as a rule rather an exception in areas like information
            technology,  bioinformatics  or  astronomy,  it  is  imperative  to  use  efficient
            technique  of  modelling  and  analyzing  such  data  to  avoid  misleading
            conclusion. Analyzing such data encounter many challenges and outliers turn
            out to be the major challenge when dealing with this data. It is important to
            detect  outliers  because  they  have  on  adverse  effect  on  the  values  of  the
            various estimates, which lead to a misleading conclusion. Several parametric
            and  non-parametric  methods  have  been  developed  to  detect  outliers,  but
            these  methods  cannot  deal  with  high  dimensional  data  (HDD).  The  fixed
            parameters support vector regression (FP-SVR) is put forward to remedy this
            problem.  Nonetheless,  the  FP-SVR  which  employs  Eps-SVR  is  not  very
            successful  in  the  identification  of  mild  outliers  and  other  contamination
            scenarios.  To  remedy  this  problem,  we  propose  to  use  Nu-SVR  to  detect
            extreme and mild outliers. The merit of our proposed method is confirmed by
            well-known examples and simulation study.

            Keywords
            Outliers; Robustness; Statistical Learning Theory; Support Vector Regression.

            1.  Introduction
               Outliers come back terribly ofttimes in the real information set, and that
            they  usually  go  unmarked.  There  are  many  forms  of  outliers  in  regression
            issues. Any observation that has a large residual is referred to as residuals
            outlier.  Observations  that  are  extreme  in  the  y-direction  are  known  as  y-
            outliers or vertical outliers and that they are answerable for model failure. High
            leverage points are those observations that are far in X-direction, and that they
            also are moving the regression model. Habshah et al. (2009) highlighted that
            it's crucial to detect multiple high leverage points as they're answerable for

                                                               258 | I S I   W S C   2 0 1 9
   264   265   266   267   268   269   270   271   272   273   274