Page 121 - Contributed Paper Session (CPS) - Volume 5
P. 121
CPS1141 Mahdi Roozbeh
see Hastie et al. (2009) and Buhlmann and van de Geer (2011), in this respect.
In a nutshell, we consider ridge regression estimation in sparse semiparametric
models in which the condition p >n makes some difficulties for classical
analysis.
Let (y1,x1,t1), · · · ,(yn,xn,tn) be observations that follow the semiparametric
regression model (SRM)
= + ( ) + , i =1,…,n (1.1)
where = ( ) is p-dimensional vector of observed covariates or
3,
2,
1,
) is an unknown p-dimensional vector
explanatory variables, = ( 2,...,
1,
of unknown parameters, the ′ are known and non-random in some
bounded domain ⊂ ℝ, ( ) is an unknown smooth function and ′s are
independent and identically distributed random errors with mean 0, variance
2
σ , which are independent of ( , ). The theory of linear models is well
established for traditional setting p < n. With modern technologies, however,
in many biological, medical, social, and economical studies, p is equal or
greater than n and making valid statistical inference is a great challenge. In the
case of p < n, there is a rich literature on model estimation.
However, classical statistical methods cannot be used for estimating
parameters of the model (1.1) when p > n, because they would overfit the data,
besides severe identifiability issues. A way out of the ill-posedness of
estimation in model (1.1) is given by assuming a sparse structure, typically
saying that only few of the components of are non-zero. Estimation of full
parametric regression model in the case of p > n and statistical inference
afterwards, started about a decade ago. See, for example, Fan and Lv (2010),
Shao and Deng (2012), Buhlmann (2013), Buhlmann et al. (2014) to mention a
few. Now, consider a semiparametric regression model in the presence of
multicollinearity. The existence of multicollinearity may lead to wide
confidence intervals for the individual parameters or linear combination of the
parameters and may produce estimates with wrong signs. For our purpose we
only employ the ridge regression concept due to Hoerl and Kennard (1970),
to combat multicollinearity. There are a lot of works adopting ridge regression
methodology to overcome the multicollinearity problem. To mention a few
recent researches in full-parametric and semiparametric regression models,
see Roozbeh and Arashi (2013), Amini and Roozbeh (2015), Roozbeh (2015).
2. Classical Estimators Under Restriction
Consider the following semiparametric regression model
= Χ + () + , (2.1)
where = ( , … , ) , X = (x1, ..., xn) is an n×p matrix, f(t) = (f(t1), ...,f(tn))
T
T
T
1
and = ( , … , ) . We assume that in general, is a vector of disturbances,
1
which is distributed as a multivariate normal, Nn(0, σ V ), where V is a
2
110 | I S I W S C 2 0 1 9