Page 159 - Special Topic Session (STS) - Volume 4
P. 159
STS577 Mahdi Roozbeh
Theorem 2 For the GCV function in (3.4)
lim ( (, ))= + lim (, ). (3.5)
̂
2
→∞ →∞
4. Real Data Study
To illustrate the usefulness of the suggested strategies for high-
dimensional data in the regres- sion model, we consider the data set about
riboavin (vitamin B2) production in Bacillus subtilis, which can be found in R
package “hdi”. There is a single real valued response variable which is the
logarithm of the riboavin production rate. Furthermore, there are p = 4088
explanatory variables measuring the logarithm of the expression level of 4088
genes. There is one rather homogeneous data set from n = 71 samples that
were hybridized repeatedly during a fed batch fermentation process where
different engineered strains and strains grown under different fermentation
conditions were analyzed. Table 1 shows a summary of the results. In this Table,
2
the RSS and R respectively are the residual sum of squares and coefficient of
determination of the model. The 3D diagram of GCV versus k and d is plotted
in Figure 1 for real data set. The minimum of GCV approximately occurred at
kopt = 0.468759 and dopt = 0.002332.
Estimator (k) (k) (k,d)
(s)
̂
̂
̂
RSS 11.440721 1.216623 0.068050
R 2 0.807080 0.979485 0.998853
148 | I S I W S C 2 0 1 9