Page 178 - Contributed Paper Session (CPS) - Volume 2
P. 178

CPS1496 Tim Christopher D.L et al.
                      We  considered  an  initial  suite  of  environmental  and  anthropological
                  covariates, at a resolution of approximately 55 kilometres that included the
                  annual  mean  and  log  standard  deviation  of  land  surface  temperature,
                  enhanced  vegetation  index,  malaria  parasite  temperature  suitability  index,
                  elevation, tasseled cap brightness, tasseled cap wetness, log accessibility to
                  cities, log night lights and proportion of urban land cover (Weiss et al., 2015).
                  Tasseled cap brightness and urban land cover were subsequently removed as
                  they  were  highly  correlated  with  other  variables.  The  covariates  were
                  standardised to have a mean of zero and a standard deviation of one. These
                  covariates were used for both the machine learning models and the polygon-
                  level models. Raster surfaces of population for the years 2005, 2010 and 2015,
                  were  created  using  data  from  WorldPop  (Tatem,  2017)  and  from  GPWv4
                  (NASA, 2018) where WorldPop did not have values. Population rasters for the
                  remaining years were created by linear interpolation.
                      For each country we fitted five models via caret (Kuhn et al., 2017): elastic
                  net  (Zou  and  Hastie,  2012),  Random  Forest  (Wright  and  Ziegler,  2015),
                  projection pursuit regression (Friedman and Stuetzle, 1981), neural networks
                  (Venables and Ripley, 2002) and boosted regression trees (Ridgeway et al.,
                  2017). Our response variable was prevalence and we weighted the data by
                  sample size (i.e. the number of people tested for malaria in each survey. For
                  each model we ran five-fold cross-validation to select hyperparameters using
                  random  search  for  Random  Forest  and  boosted  regression  trees  and  grid
                  search for the other models. Predictions from these models were then made
                  across Colombia and Madagascar respectively. These predictions were finally
                  inverse logit transformed so that they are on the linear predictor scale of the
                  top level model. The top level model was a disaggregation regression model
                  (Sturrock et al., 2014; Wilson and Wake field, 2017; Law et al., 2018; Taylor et
                  al., 2017; Li et al., 2012). This model is defined by likelihood at the level of the
                  polygon with covariates and a spatial random field at the pixel-level. Values at
                  the polygon-level are given the subscript a while pixel level values are indexed
                  with b. The polygon case count data,  is given a Poisson likelihood
                                                ya ∼ Pois(iapopa)
                  where  ia is  the  estimated  polygon  incidence  rate  and  popa is  the  observed
                  polygon population-at-risk. This polygon-level likelihood is linked to the pixel
                  level prevalence
                                                     ∑(  )
                                                        
                                                             
                                                 =   ∑  
                                                 
                                                  ib = p2i(pb)

                  where p2i is from a model that was published previously (Cameron et al., 2015)
                  which defines a function



                                                                     167 | I S I   W S C   2 0 1 9
   173   174   175   176   177   178   179   180   181   182   183