Page 62 - Special Topic Session (STS) - Volume 3
P. 62

STS515 Jim R. et al.
                  2.  Conflicting epistemologies
                     Breiman (2001) described two approaches to analysing data. He argued
                  that  most  statisticians  typically  apply  transparent  models  where  a  small
                  collection of well-defined inputs are used to predict outputs - so models are
                  used  primarily  to  explain  and  also  to  predict  (he  argues  that  this  leads  to
                  irrelevant theory and questionable conclusions). In contrast, a small proportion
                  use algorithmic modelling; techniques such as neural nets and random forests
                  are used to map inputs and outputs. The focus is primarily on prediction with
                  little attempt to explain. This can be viewed as a ‘data science’ stance.
                     Ridgway et al (2018) map out some challenges for algorithmic models –
                  notably that what you get out is determined by what you put in. So algorithmic
                  models are strong on ‘what is’ but weak on ‘what ought to be’ and can have
                  undesirable  consequences  when  used  for  (for  example)  job  selection  or
                  predictive policing. Perez (2019) provides further examples. These problems
                  are exacerbated when the data set itself does not represent the population as
                  a  whole  –  for  example  drawing  conclusions  from  (conventional)  medical
                  research that is based almost exclusively on Caucasians. This is a particularly
                  problematic challenge for data science, where decisions about analysis are
                  often based on pragmatism; a variety of models are applied to a data set, and
                  the final choice of model is based on fit and the ability of the model to predict
                  future events.
                     Statistics has been characterised by engagement with real-world problems;
                  what of data science? Consider these examples of computer uses, software
                  and devices:
                      •  Google, Amazon, Facebook, Skype;
                      •  nrecognition of individuals via face, fingerprint, voice, gait, patterns of
                         key presses;
                      •  tracking  (via  fitness  trackers,  credit  card  use,  data  from  transport
                         networks);
                      •  speech recognition and language translation;
                      •  medical diagnosis;
                      •  detection of disease outbreaks via analysis of google search data;
                      •  the Internet of Things – smart refrigerators, TVs, cars, and domestic
                         robots;
                      •  ‘deep fake’ videos;
                      •  predicting crime and recommending custodial sentences;
                      •  satnav; autonomous vehicles and weapons systems;
                      •  mapping dwellings from aerial images, in remote settings;
                      •  emotion detectors for classrooms and cars.
                     A  striking  feature  of  data  science  has  been  the  variety  of  problems
                  addressed, the kinds of data analysed and used, the range of novel models
                  developed, and its direct effects (intended and unintended) on people’s lives.

                                                                      51 | I S I   W S C   2 0 1 9
   57   58   59   60   61   62   63   64   65   66   67