Page 61 - Special Topic Session (STS) - Volume 3
P. 61
STS515 Jim R. et al.
world view from the world’s first professional association of statisticians – that
the primary function of statistics is to gather and organise resources that
others will transform into something useful. Pullinger (2013) paints a very
different picture. The motto was dropped after a year. He points to the
diversity of the founders of the RSS (which included CB – mathematician,
mechanical engineer, astronomer, and philosopher) and to their commitment
to study practical problems and to find (and implement) solutions with direct
social benefit – inventing new mathematics when needed. This tension
between gatherers and analysts, and between theoreticians and practitioners,
articulated by Lovelace in the introductory paragraph, mirrored in both
mathematics and statistics, is alive and well.
It is captured in some critiques of statistics curricula. Cobb (2015) and
Ridgway (2015) argue that introductory courses over-value tractable statistical
models, resist algorithmic thinking, and devote far too little time to realistic
problems. This critique begs two questions: ‘whose realistic problems?’; ‘what
models are missing’? In the early days of the RSS, the answer to the question
about ‘whose problems’ might well have been ‘everyone’s’ – illustrated via
pioneering work in meteorology, health, genetics, agriculture and economics,
and often associated with the invention of new mathematics. The extent to
which this tradition of conducting pioneering work with practical applications,
and inventing appropriate supporting mathematical structures, has continued
can be judged by inspecting the list of past RSS presidents (see RSS, 2019).
The question of ‘missing models’ raises bigger issues. All models are
simplifications of some reality, and the choice and applicability of any model
depends on the phenomenon to be modelled, and the purpose to which the
model will be put. “All models are wrong, but some are useful” (Box and
Draper, 1987, p424). A problem with introductory statistics courses has been
an over-emphasis on standard models (e.g. using the Normal distribution)
developed to solve problems in a pre-computer age, and a focus on
generalising from samples to populations. This is appropriate where data is
expensive to collect, where small samples can represent populations (often the
case in agriculture and medical trials - but not in situations where
disaggregated data show different patterns), and where phenomena are stable
over time (again, agriculture and some medical trials, but not social
phenomena over time), and where there is little computational power. Even in
favourable circumstances, models can be applied badly – see Ioannidis (2005)
on why most published research findings are false and the Open Science
Collaboration (2015) on failures to replicate ‘well-known’ results in psychology.
These failures constitute a serious threat to the business of creating new and
useful knowledge, and advancing progress in a number of academic
disciplines. The failures themselves can be traced to poor practices of data
collection, analysis and interpretation, which can be recognised, and remedied.
50 | I S I W S C 2 0 1 9