Page 154 - Special Topic Session (STS) - Volume 4
P. 154
STS577 Mahdi Roozbeh
Improved robust rank-based test statistics in
high-dimensional regression model
Mahdi Roozbeh
Department of Statistics, Faculty of Mathematics, Statistics and Computer Sciences, Semnan
University, Iran
Abstract
In classical regression analysis, the ordinary least-squares estimation is the
best estimation method if the essential assumptions such as normality and
independency to the error terms as well as a little or no multicollinearity in the
covariates are met. More importantly, in many biological, medical, social, and
economical studies, nowadays carry structures that the number of covariates
may exceed the sample size (high-dimension or wide data). In this situation,
the least-squares estimator is not applicable. However, if any of these
assumptions is violated, then the results can be misleading. Especially, outliers
violate the assumption of normally distributed residuals in the least-squares
regression. Robust ridge regression is a modern technique for analyzing data
that are contaminated with outliers in high-dimensional case. When
multicollinearity exists in the data set, the prediction performance of the
robust ridge regression method is higher than rank regression method. Also,
the efficiency of this estimator is highly dependent on the ridge parameter.
Generally, it is difficult to give a satisfactory answer about how to select the
ridge parameter. Because of the good properties of generalized cross
validation (GCV) and its simplicity, we use it to choose optimum value of the
ridge parameter. The proposed GCV function creates a balance between the
precision of the estimators and the biasness caused by the ridge estimation. It
behaves like an improved estimator of risk and can be used when the number
of explanatory variables is larger than the sample size in high-dimensional
problems. Finally, some numerical illustrations are given to support our
findings for the analysis of gene expression and prediction of the riboflavin
production in Bacillus subtilis.
Keywords
Generalized cross validation; High-dimension data; Multicollinearity; Rank
regression; Robust ridge regression; Spare model.
1. Introduction
Consider the setting where the observed data are realizations of {(, )}
p
with p-dimensional covariates i ∈ R and univariate continuous response
variables i ∈ . A simple high-dimensional regression model has form
143 | I S I W S C 2 0 1 9