Page 362 - Special Topic Session (STS)

Page 362 - Special Topic Session (STS) - Volume 3

P. 362

STS550 Matteo Mogliani
similar to that pointed out by Zou and Hastie (2005), and it is mostly related
to the lack of strict convexity in the Lasso penalty. To address this issue, we
propose a solution based on the Adaptive Group Lasso (AGL) estimator (Wang
and Leng, 2008). This approach introduces a penalty to a group of regressors,
rather than a single regressor, that may lead (if the group structure is carefully
set by the researcher) to a finite sample improvement of the AL. In the present
framework, it seems reasonable to define a group as each of the vectors of
lag polynomials in the model. This grouping structure is motivated by the fact
that if one high-frequency predictor is irrelevant, it should be expected that
zero-coefficients occur in all the parameters of its lag polynomial. This strategy
should overcome, at least in part, the limitation of the Lasso in presence of
strong correlation in the design matrix arising from the correlation among lags
of the transformed high-frequency predictors.
Several approaches have been proposed in the literature to estimate
penalized regressions. In this paper, we consider a Bayesian hierarchical
approach. We then introduce the Bayesian MIDAS Adaptive Group Lasso
model (BMIDAS-AGL), based on the Bayesian Group Lasso prior of Kyung et
al. (2010), where the conditional prior of can be expressed as a scale mixture
of Normals with Gamma hyper-priors. However, an expected feature of this
model is that a sparse solution cannot be perfectly achieved, as the Bayesian
approach provides a shrinkage of the coefficients towards zero, but usually
not exactly to zero. Recent literature has increasingly focused on combining
the potential advantages of spike-and-slab methods and the penalized
likelihood approach (Roëkováand George, 2018). In the present study, we
follow Xu and Ghosh (2015) and we introduce the Bayesian MIDAS Adaptive
Group Lasso with spike-and-slab priors (BMIDAS-AGL-SS). This prior provides
two shrinkage effects: the point mass at 0 (the spike part of the prior), which
leads to exact zero coefficients, and the Group Lasso prior on the slab part.
Our hierarchical models treat the penalty parameters λ as hyper-
parameters, i.e. random variables with gamma prior distributions and gamma
posterior distributions. However, the main drawback of this approach is that
these posterior distributions can be sensitive to the choice of the prior. An
alternative approach resorts to an Empirical Bayes estimation of the hyper-
parameters, i.e. using the data to propose an estimate of λ, which can be
obtained through marginal maximum likelihood. For this purpose, a usual
choice is the Monte Carlo EM algorithm (MCEM), which complements the
Gibbs sampler and provides marginal maximum likelihood estimates of the
hyper-parameters. From a computational point of view, the MCEM algorithm
may be extremely expensive, as each nth Monte Carlo iteration requires a fully
converged Gibbs sampling from the posterior distribution of the parameters.
In the present framework, careful attention must be paid to this point, because
the computational burden implied by the Group Lasso increases dramatically

351 | I S I W S C 2 0 1 9

357 358 359 360 361 362 363 364 365 366 367