Page 177 - Contributed Paper Session (CPS) - Volume 2
P. 177

CPS1496 Tim Christopher D.L et al.
            et al., 2017; Li et al., 2012).  However, the aggregation of cases over space
            means that the data may be relatively uninformative, especially if the case
            counts are aggregated over large or heterogeneous areas, because it is unclear
            where within the polygon, and in which environments, the cases occurred. This
            data is therefore often under-powered for fitting flexible, non-linear models
            as is required for accurate malaria maps (Bhatt et al., 2017, 2015). A model that
            combines  point  surveys  and  aggregated  surveillance  data,  and  therefore
            leverages  the  strength  of  both,  has  great  potential.  One  approach  for
            combining these data is to use prevalence point-surveys to train a suite of
            machine  learning  models,  and  then  use  predictions  from  these  models  as
            covariates in a model trained on polygon-level incidence data. This process of
            stacking models has proven effective in many realms however typical stacking
            uses a single dataset on a consistent scale (Sill et al., 2009; Bhatt et al., 2017).
            Here we propose training the level zero machine learning models on point-
            level, binomial prevalence data and stacking these models with a polygon-
            level, Poisson incidence model.

            2.  Methodology
                We  used  two  data  sources  that  reflect  Plasmodium  falciparum  malaria
            transmission;  point-prevalence  surveys  and  polygon-level,  aggregated
            incidence data. We selected Colombia and Madagascar as case examples as
            they both have fairly complete, publicly available, surveillance data at a finer
            geographical  resolution  than  admin  1.  The  prevalence  survey  data  were
            extracted from the Malaria Atlas Project prevalence survey database using only
            data from 1990 onwards (Bhatt et al., 2015; Guerra et al., 2007). For Colombia
            we used all points from South America (n = 522) while for Madagascar we
            used  only  Malagasy  data  (n  =  1505).  We  chose  these  geographic  regions
            based on a trade-o between wanting a large sample size but wanting data
            from  geographically  similar  areas.  The  prevalence  points  were  then
            standardised to an age range of 2{10 using the model from (Smith et al., 2007).
            The  polygon  incidence  data  were  collected  from  government  reports  and
            standardised  using  methods  defined  in  Cibulskis  et  al.  (2011).  This
            standardisation  step  accounts  for  missed  cases  due  to  lack  of  treatment
            seeking, missing case reports, and cases that sought medical attention outside
            the public health systems (Battle et al., 2016). For reports where cases were
            not reported at the species level, national estimates of the ratio between P.
            falciparum and P. vivax cases were used to calculate
            P. falciparum only cases. To minimise temporal effects we selected, for each
            country, one year of surveillance data. We used annual surveillance data from
            2015 for Colombia (952 municipalities) and data from 2013 for Madagascar
            (110 districts) as these years had the most data in each case.



                                                               166 | I S I   W S C   2 0 1 9
   172   173   174   175   176   177   178   179   180   181   182