Page 154 - Special Topic Session (STS) - Volume 4
P. 154

STS577 Mahdi Roozbeh



                                Improved robust rank-based test statistics in
                                     high-dimensional regression model
                                                Mahdi Roozbeh
                   Department of Statistics, Faculty of Mathematics, Statistics and Computer Sciences, Semnan
                                                  University, Iran

                  Abstract
                  In classical regression analysis, the ordinary least-squares estimation is the
                  best estimation method if the essential assumptions such as normality and
                  independency to the error terms as well as a little or no multicollinearity in the
                  covariates are met. More importantly, in many biological, medical, social, and
                  economical studies, nowadays carry structures that the number of covariates
                  may exceed the sample size (high-dimension or wide data). In this situation,
                  the  least-squares  estimator  is  not  applicable.  However,  if  any  of  these
                  assumptions is violated, then the results can be misleading. Especially, outliers
                  violate the assumption of normally distributed residuals in the least-squares
                  regression. Robust ridge regression is a modern technique for analyzing data
                  that  are  contaminated  with  outliers  in  high-dimensional  case.  When
                  multicollinearity  exists  in  the  data  set,  the  prediction  performance  of  the
                  robust ridge regression method is higher than rank regression method. Also,
                  the efficiency of this estimator is highly dependent on the ridge parameter.
                  Generally, it is difficult to give a satisfactory answer about how to select the
                  ridge  parameter.  Because  of  the  good  properties  of  generalized  cross
                  validation (GCV) and its simplicity, we use it to choose optimum value of the
                  ridge parameter. The proposed GCV function creates a balance between the
                  precision of the estimators and the biasness caused by the ridge estimation. It
                  behaves like an improved estimator of risk and can be used when the number
                  of explanatory variables is larger than the sample size in high-dimensional
                  problems.  Finally,  some  numerical  illustrations  are  given  to  support  our
                  findings for the analysis of gene expression and prediction of the riboflavin
                  production in Bacillus subtilis.

                  Keywords
                  Generalized cross validation; High-dimension data; Multicollinearity; Rank
                  regression; Robust ridge regression; Spare model.

                  1.  Introduction
                      Consider the setting where the observed data are realizations of {(, )}
                                                        p
                  with p-dimensional covariates  i  ∈  R   and univariate continuous response
                  variables  i  ∈   . A  simple high-dimensional regression model has form



                                                                     143 | I S I   W S C   2 0 1 9
   149   150   151   152   153   154   155   156   157   158   159