Page 182 - Contributed Paper Session (CPS) - Volume 2
P. 182
CPS1496 Tim Christopher D.L et al.
the spatial cross-validation schemes, relative to the random cross-validation
scheme, highlights that better spatial coverage of data would improve
predictions more than the improved model we have suggested.
Due to the low power of typical aggregated incidence datasets, previous
analyses using disaggregation regression used a small number of covariates
(Sturrock et al., 2014). However, as models such as Random Forest and elastic
net can robustly handle high dimensional data, future work could include
many more covariates, potentially increasing predictive performance.
While the approach presented here is related to stacking, it differs in that
we have not constrained the regression parameters to be positive nor included
a sum-to-one constraint i.e. the result is not simply a weighted average of the
level zero model predictions. We did not include these constraints because the
base models and the meta model are trained on response data on different
scales. However, future work could examine whether using a positive
constraint on the regression parameters improves performance.
Another area of potential improvement is varying the data used to train
the base level learners. Here we only used data from the region of interest.
However, the global dataset is much larger than these subsets. Training some
base level models on local data and some on the global dataset and then
combining predictions from all these models has potential to further improve
model performance.
Figure 2: Left: Observed data for Colombia (grey for zero incidence). Right:
Out-of-sample predictions for the random cross-validation, machine learning
only model. For each cross-validation fold, predictions are made for the held
out data which are then combined to make a single surface.
171 | I S I W S C 2 0 1 9