Page 296 - Special Topic Session (STS) - Volume 3
P. 296
STS544 Paolo F. et al.
adopted in the exercise does not allow a thorough discussion. Secondly, these
techniques have been used in previous econometric or statistical studies,
hence a detailed description would be superfluous. We instead try to give the
basic intuition underlying some of the main classes of models used and
redirect the interested readers to the original works in which the models we
employ were originally developed or to some previous applications in which
these models are adopted.
Among the most important models considered in the nowcasting
literature we have the dynamic factor model, in the form of Stock and Watson
(2002). The basic idea is that a handful of constructed variables, the factors,
can summarize the information contained in a large dataset. Stock and Watson
(2002) have shown that the factors can be estimated using principal
2
components. Factor models are especially important in our application
because, in addition to the basic specifications including raw firm-level data
and traffic data as predictors, we estimate specifications where we utilize latent
factors (estimated via principal components) as predictors. This is done to see
whether reducing the noise in our input data improves the performance of the
models.
Another important class of models we use is shrinkage regression, in
particular the ridge regression, the Lasso (Tibshirani, 1996) and the elastic-net
(Zou and Hastie, 2005). These main intuition of these models is to regularize
the coefficients of the predictors, in order to reduce the predictions' variance.
Hastie, Tibshirani, and Friedman (2009) provides an in-depth review of these
models, while De Mol, Giannone, and Reichlin (2008) offers an economic
forecasting application of shrinkage regressions, with an interesting
comparison with principal components.
Our nowcasts are then based on a large number of machine learning
techniques, which are covered extensively in Hastie et al. (2009): boosting,
regression trees and random forests, regression splines, support and relevance
vector machines, neural networks and k-nearest neighbors.
All the models utilized in our nowcasting exercise are estimated using the
caret package for R. Once considering specifications with different input
variables (raw data vs. sets of principal components extracted from the data),
we arrive at a total of 130 models to estimate. As benchmark model, we utilize
an automated ARIMA procedure. Moreover, we include in our models set an
automated ARIMA where we include principal components as external
predictors.
2 An alternative factors estimator can be found in Doz, Giannone, and Reichlin (2011).
285 | I S I W S C 2 0 1 9