Page 120 - Contributed Paper Session (CPS) - Volume 5
P. 120
CPS1141 Mahdi Roozbeh
A new Robust Ridge Estimator in high-
dimensional Linear Models
Mahdi Roozbeh
Faculty of Mathematics, Statistics and Computer Science, Semnan University
Abstract
In classical regression analysis, the ordinary least-squares estimation is the
best estimation method if the essential assumptions such as normality and
independency to the error terms as well as a little or no multicollinearity in the
covariates are met. More importantly, in many biological, medical, social, and
economical studies, nowadays carry structures that the number of covariates
may exceed the sample size (high-dimension or wide data). In this situation,
the least-squares estimator is not applicable. However, if any of these
assumptions is violated, then the results can be misleading. Especially, outliers
violate the assumption of normally distributed residuals in the least-squares
regression. Robust ridge regression is a modern technique for analyzing data
that are contaminated with outliers in high-dimensional case. When
multicollinearity exists in the data set, the prediction performance of the
robust ridge regression method is higher than rank regression method. Also,
The efficiency of this estimator is highly dependent on the ridge parameter.
Generally, it is difficult to give a satisfactory answer about how to select the
ridge parameter. Because of the good properties of generalized cross
validation (GCV) and its simplicity, we use it to choose optimum value of the
ridge parameter. The proposed GCV function creates a balance between the
precision of the estimators and the biasness caused by the ridge estimation. It
behaves like an improved estimator of risk and can be used when the number
of explanatory variables is larger than the sample size in high-dimensional
problems. Finally, some numerical illustrations are given to support our
findings for the analysis of gene expression and prediction of the riboflavin
production in Bacillus subtilis.
Keywords
Generalized cross validation; High-dimension data; Multicollinearity; Rank
regression; Robust ridge regression; Spare model
1. Introduction
Many data problems nowadays carry structures that the number of
covariates may exceed the sample size,i.e., p >n. In such a setting, a huge
amount of work has been pursued addressing prediction of a newresponse
variable, estimation of an underlying parameter vector and variable selection,
109 | I S I W S C 2 0 1 9