Page 58 - Special Topic Session (STS) - Volume 3
P. 58
STS515 Jim R. et al.
Looking back – looking forward; statistics and
the data science tsunami
Jim Ridgway, James Nicholson, Rosie Ridgway
School of Education, University of Durham, UK
Abstract
The discipline of statistics arose from pressing needs to address a variety of
social and scientific problems. The founders of the Royal Statistical Society in
the UK, and the American Statistical Association were very diverse in their
backgrounds and interests, but shared a common purpose – namely, to
address difficult and interesting challenges. They also acted in similar ways, by
working across disciplines, and inventing mathematics and models suited to
new problems. Computer scientists have also addressed real-world problems,
have pioneered interesting and exciting approaches to handling new sorts of
data (e.g. from sensors and social media) and have developed new analytic
tools (notably, tools based on machine learning); their work is having dramatic
(and sometimes unexpected) impacts on society. Early encounters between
statisticians and computer scientists often resembled ‘turf wars’ – with claims
that statistics was fast becoming redundant, and that computer scientists’
ignorance of core statistical concepts such as sample bias would prove fatal
to their entire enterprise. The problems that beset the start of the twentieth
century have not gone away; modern societies face a wide range of existential
threats such as global warming and nuclear war. As before, collaboration
across disciplines, and the creation of new modelling tools are needed to
address these problems. Here we begin by drawing lessons from the
development of computer science in its earliest days, focussing on Babbage’s
Analytical Engine. We then highlight key epistemological differences between
traditional statistics and traditional computer science, such as the role of
theory and the use of ‘black-box’ models. We argue the case for the
development of the Epistemological Engine – a tool for analysing and
improving the processes of knowledge creation and utilisation that will require
the skills of both statisticians and data scientists. We conclude by identifying
competences and dispositions relevant to students of statistics and data
science, drawing on both contemporary developments and the earliest days
of computing.
Keywords
Modelling; Turf wars; Epistemology; Black-box; engineering
47 | I S I W S C 2 0 1 9