Page 177 - Contributed Paper Session (CPS) - Volume 2
P. 177
CPS1496 Tim Christopher D.L et al.
et al., 2017; Li et al., 2012). However, the aggregation of cases over space
means that the data may be relatively uninformative, especially if the case
counts are aggregated over large or heterogeneous areas, because it is unclear
where within the polygon, and in which environments, the cases occurred. This
data is therefore often under-powered for fitting flexible, non-linear models
as is required for accurate malaria maps (Bhatt et al., 2017, 2015). A model that
combines point surveys and aggregated surveillance data, and therefore
leverages the strength of both, has great potential. One approach for
combining these data is to use prevalence point-surveys to train a suite of
machine learning models, and then use predictions from these models as
covariates in a model trained on polygon-level incidence data. This process of
stacking models has proven effective in many realms however typical stacking
uses a single dataset on a consistent scale (Sill et al., 2009; Bhatt et al., 2017).
Here we propose training the level zero machine learning models on point-
level, binomial prevalence data and stacking these models with a polygon-
level, Poisson incidence model.
2. Methodology
We used two data sources that reflect Plasmodium falciparum malaria
transmission; point-prevalence surveys and polygon-level, aggregated
incidence data. We selected Colombia and Madagascar as case examples as
they both have fairly complete, publicly available, surveillance data at a finer
geographical resolution than admin 1. The prevalence survey data were
extracted from the Malaria Atlas Project prevalence survey database using only
data from 1990 onwards (Bhatt et al., 2015; Guerra et al., 2007). For Colombia
we used all points from South America (n = 522) while for Madagascar we
used only Malagasy data (n = 1505). We chose these geographic regions
based on a trade-o between wanting a large sample size but wanting data
from geographically similar areas. The prevalence points were then
standardised to an age range of 2{10 using the model from (Smith et al., 2007).
The polygon incidence data were collected from government reports and
standardised using methods defined in Cibulskis et al. (2011). This
standardisation step accounts for missed cases due to lack of treatment
seeking, missing case reports, and cases that sought medical attention outside
the public health systems (Battle et al., 2016). For reports where cases were
not reported at the species level, national estimates of the ratio between P.
falciparum and P. vivax cases were used to calculate
P. falciparum only cases. To minimise temporal effects we selected, for each
country, one year of surveillance data. We used annual surveillance data from
2015 for Colombia (952 municipalities) and data from 2013 for Madagascar
(110 districts) as these years had the most data in each case.
166 | I S I W S C 2 0 1 9