Page 222 - Special Topic Session (STS) - Volume 2
P. 222
STS486 R. Ayesha A. et al.
The penalized likelihood for the DM lasso becomes:
∗ ̃
̃
∗
∗
ℓ ( , ) = − ℓ( ) + ∑ | |, (2)
=1
∗
where ∑ | | is the L1-norm. Note that although we estimate the
=1
dispersion , we do not penalize for this parameter (which is estimated in the
∗
M’ element of ).
th
We also implemented the adaptive lasso (Zou, 2006), which scales the L1-
norm term by an adaptive data-driven weight vector, ̂ = | ̂ − ̃ , where
|
̃ > 0 is a tuning parameter that adjusts the weights. The adaptive lasso
penalizes irrelevant predictors more than relevant predictors, thereby leading
to consistent model selection and optimal prediction. Minimization of (2)
finds the penalized MLEs with certain parameters shrunk to exactly zero thus
achieving variable selection and parameter estimation simultaneously. For
minimization, we implemented FISTA (Beck and Teboulle, 2009) with the
ADADELTA (Zeiler, 2013) learning rate (i.e., stepsize) method. The optimal
̃
̃
value of was tuned via BIC over an equally-spaced log grid of 100 values,
starting from ̃ = max | | to ̃ = 0.01, where is the gradient vector
̂
evaluated at = 0, ≠ . For the adaptive lasso, we let ̃ = 1, 2, 3 before
∗
̃
running FISTA to find the optimal . R code to fit these models is available on
GitHub (Crea, 2016).
3. Simulation Study
Here we consider two DM regression dispersion structures: none ( = 0)
and constant ( = 6). Networks were generated in R, per Crea et al. (2016),
based on a gamma-Poisson parameterization of the DM model. We
considered three network sizes: small (25×10), medium (50×20) and large
(100×30). Entries of the covariate array X were generated based on
complementarity linkage rules following Santamaría and Rodríguez-Gironés
(2007). See Crea et al. (2016) for more details. We generated K=20 covariates
∗
in total, of which only the first 4 were relevant to pollination. We =
(– 0.5, 1, – 1, 2, 0, … , 0 ) and evaluated the performance of the regularized
group DM model in the fixed parameter dimension. Results were averaged
over 100 replicates.
To evaluate variable selection, we calculated the percent of models among
the 100 replicates that selected the true model, an underfit model and an
overfit model, respectively. We also calculate the average number of zero
coefficients correctly estimated to be nonzero (i.e., true positives) and the
average number of nonzero coefficients incorrectly estimated to be zero (i.e.,
false negatives) over the 100 replicates. Finally, we calculated the average
mean squared error to assess parameter estimation.
Table 1 shows the results from fitting a regularized regression model to
the simulated data. When data were generated with no dispersion, our
211 | I S I W S C 2 0 1 9