Page 62 - Special Topic Session (STS) - Volume 3
P. 62
STS515 Jim R. et al.
2. Conflicting epistemologies
Breiman (2001) described two approaches to analysing data. He argued
that most statisticians typically apply transparent models where a small
collection of well-defined inputs are used to predict outputs - so models are
used primarily to explain and also to predict (he argues that this leads to
irrelevant theory and questionable conclusions). In contrast, a small proportion
use algorithmic modelling; techniques such as neural nets and random forests
are used to map inputs and outputs. The focus is primarily on prediction with
little attempt to explain. This can be viewed as a ‘data science’ stance.
Ridgway et al (2018) map out some challenges for algorithmic models –
notably that what you get out is determined by what you put in. So algorithmic
models are strong on ‘what is’ but weak on ‘what ought to be’ and can have
undesirable consequences when used for (for example) job selection or
predictive policing. Perez (2019) provides further examples. These problems
are exacerbated when the data set itself does not represent the population as
a whole – for example drawing conclusions from (conventional) medical
research that is based almost exclusively on Caucasians. This is a particularly
problematic challenge for data science, where decisions about analysis are
often based on pragmatism; a variety of models are applied to a data set, and
the final choice of model is based on fit and the ability of the model to predict
future events.
Statistics has been characterised by engagement with real-world problems;
what of data science? Consider these examples of computer uses, software
and devices:
• Google, Amazon, Facebook, Skype;
• nrecognition of individuals via face, fingerprint, voice, gait, patterns of
key presses;
• tracking (via fitness trackers, credit card use, data from transport
networks);
• speech recognition and language translation;
• medical diagnosis;
• detection of disease outbreaks via analysis of google search data;
• the Internet of Things – smart refrigerators, TVs, cars, and domestic
robots;
• ‘deep fake’ videos;
• predicting crime and recommending custodial sentences;
• satnav; autonomous vehicles and weapons systems;
• mapping dwellings from aerial images, in remote settings;
• emotion detectors for classrooms and cars.
A striking feature of data science has been the variety of problems
addressed, the kinds of data analysed and used, the range of novel models
developed, and its direct effects (intended and unintended) on people’s lives.
51 | I S I W S C 2 0 1 9