Page 260 - Special Topic Session (STS) - Volume 2
P. 260

STS490 Riaan d.J.
                  and concentrate on brute force computer power and complex optimisation
                  algorithms to solve real-time prediction problems. Examples are the successful
                  applications  of  neural  networks  and  deep  learning  in  the  area  of  speech
                  recognition and language processing (e.g. Siri and Google Assistant). It should
                  be noted that machine learning is frequently concerned with prediction tasks
                  and  models  in  this  context  (e.g.  recommender  systems  and  factorisation
                  machines). Unlike statistics, machine learning is not concerned with traditional
                  aspects of statistical inference (e.g. about the significance of the estimates of
                  model  parameters).  Although  statistics  and  machine  learning  are  different
                  disciplines, there is some overlap, for example a technique like random forests
                  are frequently quoted in both fields. A reader of the literature in both fields
                  will quickly realise a difference in the terminology used for similar concepts.  It
                  is interesting to note that in a recent paper, a prominent researcher at Harvard
                  University  (Meng,  2018)  warned  about  the  big  data  paradox,  i.e.  he
                  emphasised that data quality plays an enormous role and that having more
                  data will fool us when making population inferences in a big data context.
                     Another subfield of data science is operations research (OR), which became
                  popular in the early eighties. Spurned by the advent and wider availability of
                  personal  computers,  OR,  like  data  science  now,  was  all  about  using  the
                  mathematical and computing sciences to solve real-world problems in a multi-
                  and interdisciplinary way.

                  2.  What is a data scientist?
                     Because data science is so wide in scope, many professionals may claim
                  that  they  are  data  scientists,  e.g.  statisticians,  operations  researchers,
                  engineers,  computer  scientists,  actuaries,  physicists  and  machine  learners.
                  From my own practical experience, it is clear that when solving data science
                  problems, you need a range of people of which some can work in depth on
                  theory and others can tend to application. It is a way to attempt to cover the
                  whole spectrum. When solving complex problems in data science, one person
                  cannot handle all aspects, but it could possibly be achieved with a group of
                  people.  Currently  the  main  focus  of  data  scientists  is  to  use  innovative
                  techniques emanating from the subfields to solve problems in a  particular
                  application area of interest. It should be noted that the application areas and
                  therefore  the  type  of  problems  encountered  are  very  different,  frequently
                  necessitating a deep knowledge of the particular subject matter. For example,
                  consider  astrophysics  and  the  squared  kilometre  array.  Apparently  these
                  telescopes will receive data at one terabyte per second and researchers are
                  typically interested in detecting tiny signals engulfed in white noise. On the
                  other hand, in finance, amongst others, researchers exploit large data bases to
                  learn more about the credit behaviour of customers.


                                                                     249 | I S I   W S C   2 0 1 9
   255   256   257   258   259   260   261   262   263   264   265