Page 120 - Contributed Paper Session (CPS) - Volume 5
P. 120

CPS1141 Mahdi Roozbeh



                                   A new Robust Ridge Estimator in high-
                                         dimensional Linear Models
                                                Mahdi Roozbeh
                         Faculty of Mathematics, Statistics and Computer Science, Semnan University

                  Abstract
                  In classical regression analysis,  the ordinary least-squares estimation is the
                  best estimation method if the essential assumptions such as normality and
                  independency to the error terms as well as a little or no multicollinearity in the
                  covariates are met. More importantly, in many biological, medical, social, and
                  economical studies, nowadays carry structures that the number of covariates
                  may exceed the sample size (high-dimension or wide data). In this situation,
                  the  least-squares  estimator  is  not  applicable.  However,  if  any  of  these
                  assumptions is violated, then the results can be misleading. Especially, outliers
                  violate the assumption of normally distributed residuals in the least-squares
                  regression. Robust ridge regression is a modern technique for analyzing data
                  that  are  contaminated  with  outliers  in  high-dimensional  case.  When
                  multicollinearity  exists  in  the  data  set,  the  prediction  performance  of  the
                  robust ridge regression method is higher than rank regression method. Also,
                  The efficiency of this estimator is highly dependent on the ridge parameter.
                  Generally, it is difficult to give a satisfactory answer about how to select the
                  ridge  parameter.  Because  of  the  good  properties  of  generalized  cross
                  validation (GCV) and its simplicity, we use it to choose optimum value of the
                  ridge parameter. The proposed GCV function creates a balance between the
                  precision of the estimators and the biasness caused by the ridge estimation. It
                  behaves like an improved estimator of risk and can be used when the number
                  of explanatory variables is larger than the sample size in high-dimensional
                  problems.  Finally,  some  numerical  illustrations  are  given  to  support  our
                  findings for the analysis of gene expression and prediction of the riboflavin
                  production in Bacillus subtilis.

                  Keywords
                  Generalized cross validation; High-dimension data; Multicollinearity; Rank
                  regression; Robust ridge regression; Spare model

                  1.  Introduction
                      Many  data  problems  nowadays  carry  structures  that  the  number  of
                  covariates  may  exceed  the  sample  size,i.e.,  p >n.  In  such  a  setting,  a  huge
                  amount of work has been pursued addressing prediction of a newresponse
                  variable, estimation of an underlying parameter vector and variable selection,


                                                                     109 | I S I   W S C   2 0 1 9
   115   116   117   118   119   120   121   122   123   124   125