Page 64 - Special Topic Session (STS) - Volume 3
P. 64
STS515 Jim R. et al.
system associated with the creation and use of knowledge – in short,
designing an Epistemological Engine (EE) has become a priority. The prime
candidates for creating and building the EE are statisticians and data scientists.
Early encounters between statisticians and data scientists were often
acrimonious; ‘statistics’ would be a casualty in ‘the death of theory’, and data
scientists’ ignorance of core statistical concepts such as sample bias and
overfitting would prove fatal to their entire enterprise. The EE should be
founded on techniques and skills used in both data science and statistics. Data
scientists create open data repositories (e.g. https://registry.opendata.aws/),
and have adopted a culture of sharing code – especially Workflows (e.g.
https://github.com/) to facilitate a comparison of different analytical
techniques and modelling assumptions. They use Common Task Frameworks
wherein success is judged on terms of actual performance in analysis, not
theoretical niceties. Statisticians bring sophistication about data acquisition
(including synthesising and triangulating data sources), preparation, and
exploration. They can contribute to analyses, data representation and
communication, and can comment on issues such as the likely generalisability
of findings. They bring considerable sophistication about modelling.
Identifying the style of modelling being used by different researchers
(explicitly or implicitly) should be automated in the EE.. Ridgway (1998)
classifies styles of modelling, and describes analytic models (such as those
found in school physics), systems models (such as those found in school
biology) and macrosystemic models – these are systems models where the
system itself undergoes change. Macrosystemic models can be divided into
two groups – models where the changes in the system are relatively
predictable (e.g. ecological restoration; the life cycle of the butterfly) or
unpredictable (Brexit; climate change and global political stability in the Trump
era).
The EE should comprise a large tool collection. Sample tools include:
• Critical evaluation of specific studies, using criteria for evaluation such
as those identified by Ioannidis (2005) and the Open Science
Collaboration (2015), e.g. identifying weak effects using small samples,
and testing multiple hypotheses until a ‘significant’ result is found;
• Identification of academic areas where there is insufficient sharing of
data, code and workflows;
• Identification of academic areas that are paradigm-bound (i.e.
characterised by analyses of rather few classes of data, and by the use
of a small set of analytic tools);
• Tools for automated testing of code and workflows;
• Identification of results that are important for some theoretical claims,
where the evidence base is weak (e.g. where there has been little
replication across relevant populations);
53 | I S I W S C 2 0 1 9