Page 176 - Contributed Paper Session (CPS) - Volume 2
P. 176
CPS1496 Tim Christopher D.L et al.
Model ensembles with different response
variables for base and meta models: Malaria
disaggregation regression combining prevalence
and incidence data
Tim Christopher David Lucas, Chantal Hendriks, Andre Python, Anita Nandi,
Penelope Hancock, Michele Nguyen, Peter Gething, Susan Rumisha, Daniel
Weiss, Katherine Battle, Ewan Cameron, Rosalind Howes
Malaria Atlas Project, Big Data Institute, University of Oxford, Oxford, UK
Abstract
Maps of infection risk are a vital tool for the elimination of malaria. Routine
surveillance data of malaria case counts, often aggregated over administrative
regions, is becoming more widely available and can better measure low
malaria risk than prevalence surveys. However, aggregation of case counts
over large, heterogeneous areas means that these data are often
underpowered for learning relationships between the environment and
malaria risk. A model that combines point surveys and aggregated surveillance
data could have the benefits of both but must be able to account for the fact
that these two data types are different malariometric units. Here, we train
multiple machine learning models on point surveys and then combine the
predictions from these with a geostatistical disaggregation model that uses
routine surveillance data. We find that, in tests using data from Colombia and
Madagascar, using a disaggregation regression model to combine predictions
from machine learning models trained on point surveys improves model
accuracy relative to using the environmental covariates directly.
Keywords
Spatial statistics; Ensemble; Stacking; Epidemiology
1. Introduction
High-resolution maps of malaria risk are vital for elimination but mapping
malaria in low burden countries presents new challenges as traditional
mapping of prevalence from cluster-level surveys (Gething et al., 2011; Bhatt
et al., 2017; Gething et al., 2012; Bhatt et al., 2015) is often not effective
because, firstly, so few individuals are infected that most surveys will detect
zero cases, and secondly, because of the lack of nationally representative
prevalence surveys in low burden countries (Sturrock et al., 2016, 2014).
Routine surveillance data of malaria case counts, often aggregated over
administrative regions defined by geographic polygons, is becoming more
reliable and more widely available (Sturrock et al., 2016) and recent work has
focussed on methods for estimating high-resolution malaria risk from these
data (Sturrock et al., 2014; Wilson and Wake field, 2017; Law et al., 2018; Taylor
165 | I S I W S C 2 0 1 9