Page 260 - Special Topic Session (STS) - Volume 2
P. 260
STS490 Riaan d.J.
and concentrate on brute force computer power and complex optimisation
algorithms to solve real-time prediction problems. Examples are the successful
applications of neural networks and deep learning in the area of speech
recognition and language processing (e.g. Siri and Google Assistant). It should
be noted that machine learning is frequently concerned with prediction tasks
and models in this context (e.g. recommender systems and factorisation
machines). Unlike statistics, machine learning is not concerned with traditional
aspects of statistical inference (e.g. about the significance of the estimates of
model parameters). Although statistics and machine learning are different
disciplines, there is some overlap, for example a technique like random forests
are frequently quoted in both fields. A reader of the literature in both fields
will quickly realise a difference in the terminology used for similar concepts. It
is interesting to note that in a recent paper, a prominent researcher at Harvard
University (Meng, 2018) warned about the big data paradox, i.e. he
emphasised that data quality plays an enormous role and that having more
data will fool us when making population inferences in a big data context.
Another subfield of data science is operations research (OR), which became
popular in the early eighties. Spurned by the advent and wider availability of
personal computers, OR, like data science now, was all about using the
mathematical and computing sciences to solve real-world problems in a multi-
and interdisciplinary way.
2. What is a data scientist?
Because data science is so wide in scope, many professionals may claim
that they are data scientists, e.g. statisticians, operations researchers,
engineers, computer scientists, actuaries, physicists and machine learners.
From my own practical experience, it is clear that when solving data science
problems, you need a range of people of which some can work in depth on
theory and others can tend to application. It is a way to attempt to cover the
whole spectrum. When solving complex problems in data science, one person
cannot handle all aspects, but it could possibly be achieved with a group of
people. Currently the main focus of data scientists is to use innovative
techniques emanating from the subfields to solve problems in a particular
application area of interest. It should be noted that the application areas and
therefore the type of problems encountered are very different, frequently
necessitating a deep knowledge of the particular subject matter. For example,
consider astrophysics and the squared kilometre array. Apparently these
telescopes will receive data at one terabyte per second and researchers are
typically interested in detecting tiny signals engulfed in white noise. On the
other hand, in finance, amongst others, researchers exploit large data bases to
learn more about the credit behaviour of customers.
249 | I S I W S C 2 0 1 9