Page 259 - Special Topic Session (STS) - Volume 2
P. 259
STS490 Riaan d.J.
Professional data scientists: Who are they and
how do we train them?
Riaan de Jongh
North-West University, South Africa
Abstract
Much has been written on the divide between industry and academia,
especially in the field of Statistics. This talk will propose some guidelines on
how the gap between academia and industry may be bridged, both in teaching
and research aspects. The guidelines will be illustrated by using a case study
of a successful professional university programme.
Keywords
Training; Professional; Industry-University collaboration
1. What is Data Science?
Data science is a multi-disciplinary field consisting of a number of
disciplines (e.g. applied mathematics, statistics, machine learning, operations
research, artificial intelligence). It is used to solve problems in various
application areas, for example health sciences, astrophysics, agriculture,
telecommunications and finance. Its primary aim is to extract insight from data
in various forms, both structured and unstructured. At the core is so-called
“big data” that are stored in various ways and exhibit complex many-to-many
relationships, which is made more challenging by the ever-increasing
requirement to process these in real time to support “instantaneous” decision-
making. The rise of data science can largely be attributed to advances in
computer technology and processing speed, low cost storage of data, and the
massive availability of data from the Internet and other sources. The access to
big data and the advances in computer technology make possible the
renewed application of machine learning and statistical techniques on
problems, reporting huge successes in a wide range of applications. One of
the subfields of data science is statistics, a branch of mathematics dealing with
the collection, analysis and interpretation of data. Statistics have established
itself firmly as an academic discipline and has been in existence since the
eighteenth century. Because of the big data explosion, a number of the
classical statistical approaches that perform reasonably well for small datasets
fail when dealing with huge datasets. Despite this, many more recently
developed statistical techniques are used successfully in a big data context.
Examples are logistic regression, cluster analysis, and decision trees. Machine
learning and artificial intelligence are relatively new subfields of data science
248 | I S I W S C 2 0 1 9