Page 180 - Contributed Paper Session (CPS) - Volume 2
P. 180

CPS1496 Tim Christopher D.L et al.
                  3.  Result
                      Figure 1 shows the model performance under random and spatial cross-
                  validation for both Madagascar and Colombia. The poor model performance
                  in Colombia under spatial cross-validation indicates that the covariates alone
                  cannot explain malaria incidence in this area. For all other models that use the
                  machine  learning  predictions  as  covariates,  correlations  between  observed
                  and  predicted  data  of  0.54  {0.76  were  achieved  (Table  1).  Input  data  and
                  mapped  out-of-sample  predictions  of  the  best  performing  model,  in
                  Colombia, are shown in in Figure 2.

                      Table 1: Pearson correlations between observed and predicted values
                       Cross-validation  Country       covariates    ML        Covs + ML
                       scheme
                       Random            Colombia      0.45          0.55      0.54

                       Random            Madagascar   0.70           0.76      0.75


                       Spatial           Colombia      0.05          0.18      0.10

                       Spatial           Madagascar   0.22           0.63      0.61


                      The model using only machine learning predictions as covariates was the
                  best performing model in both countries and both cross-validation schemes
                  (Table  1).  As  expected,  models  performed  better  in  the  random  cross-
                  validation  scheme  than  the  spatial  cross-validation  scheme.  The  difference
                  between the covariate only model and the machine learning predictions only
                  model was greater in the spatial cross-validation scheme than in the random
                  cross-validation.  The  improvement  in  performance  between  the  worst  and
                  best models was always smaller than the difference between the random and
                  spatial cross-validation schemes. Predictive performance of machine learning
                  models was similar, with Random Forest performing best in Madagascar and
                  neural networks, Random Forests and elastic net performing equally well in
                  Colombia (Table 2). The means (across folds) of the regression coefficients (i.e.
                  the weights of the machine learning models in the level zero model) from the
                  polygon-level  models  that  used  only  predictions  from  machine  learning
                  models as covariates can also be seen in Table 2. The estimated regression
                  parameters  are  similar  between  the  random  and  spatial  cross-validation
                  schemes. However, the best performing machine learning models do not have
                  the  largest  estimated  regression  coefficients  as  would  be  expected  if
                  prevalence  and  incidence  were  completely  correlated.  Also  of  note  is  that
                  some models were estimated to have a negative relationship with incidence



                                                                     169 | I S I   W S C   2 0 1 9
   175   176   177   178   179   180   181   182   183   184   185