Page 178 - Contributed Paper Session (CPS) - Volume 2
P. 178
CPS1496 Tim Christopher D.L et al.
We considered an initial suite of environmental and anthropological
covariates, at a resolution of approximately 55 kilometres that included the
annual mean and log standard deviation of land surface temperature,
enhanced vegetation index, malaria parasite temperature suitability index,
elevation, tasseled cap brightness, tasseled cap wetness, log accessibility to
cities, log night lights and proportion of urban land cover (Weiss et al., 2015).
Tasseled cap brightness and urban land cover were subsequently removed as
they were highly correlated with other variables. The covariates were
standardised to have a mean of zero and a standard deviation of one. These
covariates were used for both the machine learning models and the polygon-
level models. Raster surfaces of population for the years 2005, 2010 and 2015,
were created using data from WorldPop (Tatem, 2017) and from GPWv4
(NASA, 2018) where WorldPop did not have values. Population rasters for the
remaining years were created by linear interpolation.
For each country we fitted five models via caret (Kuhn et al., 2017): elastic
net (Zou and Hastie, 2012), Random Forest (Wright and Ziegler, 2015),
projection pursuit regression (Friedman and Stuetzle, 1981), neural networks
(Venables and Ripley, 2002) and boosted regression trees (Ridgeway et al.,
2017). Our response variable was prevalence and we weighted the data by
sample size (i.e. the number of people tested for malaria in each survey. For
each model we ran five-fold cross-validation to select hyperparameters using
random search for Random Forest and boosted regression trees and grid
search for the other models. Predictions from these models were then made
across Colombia and Madagascar respectively. These predictions were finally
inverse logit transformed so that they are on the linear predictor scale of the
top level model. The top level model was a disaggregation regression model
(Sturrock et al., 2014; Wilson and Wake field, 2017; Law et al., 2018; Taylor et
al., 2017; Li et al., 2012). This model is defined by likelihood at the level of the
polygon with covariates and a spatial random field at the pixel-level. Values at
the polygon-level are given the subscript a while pixel level values are indexed
with b. The polygon case count data, is given a Poisson likelihood
ya ∼ Pois(iapopa)
where ia is the estimated polygon incidence rate and popa is the observed
polygon population-at-risk. This polygon-level likelihood is linked to the pixel
level prevalence
∑( )
= ∑
ib = p2i(pb)
where p2i is from a model that was published previously (Cameron et al., 2015)
which defines a function
167 | I S I W S C 2 0 1 9