Page 182 - Contributed Paper Session (CPS) - Volume 2
P. 182

CPS1496 Tim Christopher D.L et al.
                  the spatial cross-validation schemes, relative to the random cross-validation
                  scheme,  highlights  that  better  spatial  coverage  of  data  would  improve
                  predictions more than the improved model we have suggested.
                      Due to the low power of typical aggregated incidence datasets, previous
                  analyses using disaggregation regression used a small number of covariates
                  (Sturrock et al., 2014). However, as models such as Random Forest and elastic
                  net  can  robustly  handle  high  dimensional  data,  future  work  could  include
                  many more covariates, potentially increasing predictive performance.
                      While the approach presented here is related to stacking, it differs in that
                  we have not constrained the regression parameters to be positive nor included
                  a sum-to-one constraint i.e. the result is not simply a weighted average of the
                  level zero model predictions. We did not include these constraints because the
                  base models and the meta model are trained on response data on different
                  scales.  However,  future  work  could  examine  whether  using  a  positive
                  constraint on the regression parameters improves performance.
                      Another area of potential improvement is varying the data used to train
                  the base level learners. Here we only used data from the region of interest.
                  However, the global dataset is much larger than these subsets. Training some
                  base level models on local data and some on the global dataset and then
                  combining predictions from all these models has potential to further improve
                  model performance.





















                  Figure 2: Left: Observed data for Colombia (grey for zero incidence). Right:
                  Out-of-sample predictions for the random cross-validation, machine learning
                  only model. For each cross-validation fold, predictions are made for the held
                  out data which are then combined to make a single surface.







                                                                     171 | I S I   W S C   2 0 1 9
   177   178   179   180   181   182   183   184   185   186   187