Page 176 - Contributed Paper Session (CPS) - Volume 2
P. 176

CPS1496 Tim Christopher D.L et al.
                                  Model ensembles with different response
                                 variables for base and meta models: Malaria
                              disaggregation regression combining prevalence
                                              and incidence data
                   Tim Christopher David Lucas, Chantal Hendriks, Andre Python, Anita Nandi,
                   Penelope Hancock, Michele Nguyen, Peter Gething, Susan Rumisha, Daniel
                            Weiss, Katherine Battle, Ewan Cameron, Rosalind Howes

                           Malaria Atlas Project, Big Data Institute, University of Oxford, Oxford, UK

                  Abstract
                  Maps of infection risk are a vital tool for the elimination of malaria. Routine
                  surveillance data of malaria case counts, often aggregated over administrative
                  regions,  is  becoming  more  widely  available  and  can  better  measure  low
                  malaria  risk  than  prevalence surveys. However, aggregation of  case counts
                  over  large,  heterogeneous  areas  means  that  these  data  are  often
                  underpowered  for  learning  relationships  between  the  environment  and
                  malaria risk. A model that combines point surveys and aggregated surveillance
                  data could have the benefits of both but must be able to account for the fact
                  that  these  two  data  types  are  different  malariometric  units.  Here,  we  train
                  multiple machine learning models on point surveys and then combine the
                  predictions from these with a geostatistical disaggregation model that uses
                  routine surveillance data. We find that, in tests using data from Colombia and
                  Madagascar, using a disaggregation regression model to combine predictions
                  from  machine  learning  models  trained  on  point  surveys  improves  model
                  accuracy relative to using the environmental covariates directly.

                  Keywords
                  Spatial statistics; Ensemble; Stacking; Epidemiology

                  1.  Introduction
                      High-resolution maps of malaria risk are vital for elimination but mapping
                  malaria  in  low  burden  countries  presents  new  challenges  as  traditional
                  mapping of prevalence from cluster-level surveys (Gething et al., 2011; Bhatt
                  et  al.,  2017;  Gething  et  al.,  2012;  Bhatt  et  al.,  2015)  is  often  not  effective
                  because, firstly, so few individuals are infected that most surveys will detect
                  zero  cases,  and  secondly,  because  of  the  lack  of  nationally  representative
                  prevalence  surveys  in  low  burden  countries  (Sturrock  et  al.,  2016,  2014).
                  Routine  surveillance  data  of  malaria  case  counts,  often  aggregated  over
                  administrative regions defined by geographic polygons, is becoming more
                  reliable and more widely available (Sturrock et al., 2016) and recent work has
                  focussed on methods for estimating high-resolution malaria risk from these
                  data (Sturrock et al., 2014; Wilson and Wake field, 2017; Law et al., 2018; Taylor

                                                                     165 | I S I   W S C   2 0 1 9
   171   172   173   174   175   176   177   178   179   180   181