Page 222 - Special Topic Session (STS) - Volume 2
P. 222

STS486 R. Ayesha A. et al.
                     The penalized likelihood for the DM lasso becomes:
                                     ∗ ̃
                                                       ̃
                                                                ∗
                                                  ∗
                                 ℓ ( , ) = − ℓ( ) +  ∑   | |,                 (2)
                                                                
                                                          =1
                                  
                                 ∗
                  where  ∑   | |  is  the  L1-norm.    Note  that  although  we  estimate  the
                                
                           =1
                  dispersion , we do not penalize for this parameter (which is estimated in the
                                   ∗
                  M’  element of  ).
                     th
                     We also implemented the adaptive lasso (Zou, 2006), which scales the L1-
                  norm term by an adaptive data-driven weight vector, ̂   = | ̂  − ̃ , where
                                                                                   |
                  ̃ > 0 is  a  tuning  parameter  that  adjusts  the  weights.    The  adaptive  lasso
                  penalizes irrelevant predictors more than relevant predictors, thereby leading
                  to  consistent  model  selection  and  optimal  prediction.    Minimization  of (2)
                  finds the penalized MLEs with certain parameters shrunk to exactly zero thus
                  achieving  variable  selection  and  parameter  estimation  simultaneously.  For
                  minimization,  we  implemented  FISTA  (Beck  and  Teboulle,  2009)  with  the
                  ADADELTA (Zeiler, 2013) learning rate (i.e., stepsize)  method.  The optimal
                           ̃
                                                                                     ̃
                  value of  was tuned via BIC over an equally-spaced log grid of 100  values,
                  starting from  ̃   = max  |  |   to   ̃   = 0.01, where  is the gradient vector
                                                                       
                                        
                                             ̂ 
                  evaluated at  = 0,  ≠ . For the adaptive lasso, we let ̃ = 1, 2, 3 before
                                ∗
                                
                                                   ̃
                  running FISTA to find the optimal . R code to fit these models is available on
                  GitHub (Crea, 2016).

                  3.  Simulation Study
                     Here we consider two DM regression dispersion structures: none ( = 0)
                                                                                       
                  and constant ( = 6). Networks were generated in R, per Crea et al. (2016),
                                 
                  based  on  a  gamma-Poisson  parameterization  of  the  DM  model.    We
                  considered  three  network  sizes:  small  (25×10),  medium  (50×20)  and  large
                  (100×30).  Entries  of  the  covariate  array  X  were  generated  based  on
                  complementarity linkage rules following Santamaría and Rodríguez-Gironés
                  (2007).  See Crea et al. (2016) for more details. We generated K=20 covariates
                                                                                         ∗
                  in total, of which only the first 4 were relevant to pollination.  We    =
                   (– 0.5, 1, – 1, 2, 0, … , 0 )  and  evaluated  the  performance  of  the  regularized
                  group DM model in the fixed parameter dimension.  Results were averaged
                  over 100 replicates.
                     To evaluate variable selection, we calculated the percent of models among
                  the 100 replicates that selected the true model, an underfit model and an
                  overfit  model,  respectively.  We  also  calculate  the  average  number  of  zero
                  coefficients  correctly  estimated  to  be  nonzero  (i.e.,  true  positives)  and  the
                  average number of nonzero coefficients incorrectly estimated to be zero (i.e.,
                  false  negatives)  over  the  100  replicates.  Finally,  we  calculated  the  average
                  mean squared error to assess parameter estimation.
                     Table 1 shows the results from fitting a regularized regression model to
                  the  simulated  data.  When  data  were  generated  with  no  dispersion,  our


                                                                     211 | I S I   W S C   2 0 1 9
   217   218   219   220   221   222   223   224   225   226   227