Page 259 - Special Topic Session (STS) - Volume 2
P. 259

STS490 Riaan d.J.



                          Professional data scientists: Who are they and
                                      how do we train them?
                                          Riaan de Jongh
                                    North-West University, South Africa

            Abstract
            Much  has  been  written  on  the  divide  between  industry  and  academia,
            especially in the field of Statistics. This talk will propose some guidelines on
            how the gap between academia and industry may be bridged, both in teaching
            and research aspects. The guidelines will be illustrated by using a case study
            of a successful professional university programme.

            Keywords
            Training; Professional; Industry-University collaboration

            1.  What is Data Science?
                Data  science  is  a  multi-disciplinary  field  consisting  of  a  number  of
            disciplines (e.g. applied mathematics, statistics, machine learning, operations
            research,  artificial  intelligence).  It  is  used  to  solve  problems  in  various
            application  areas,  for  example  health  sciences,  astrophysics,  agriculture,
            telecommunications and finance. Its primary aim is to extract insight from data
            in various forms, both structured and unstructured. At the core is so-called
            “big data” that are stored in various ways and exhibit complex many-to-many
            relationships,  which  is  made  more  challenging  by  the  ever-increasing
            requirement to process these in real time to support “instantaneous” decision-
            making.  The  rise  of  data  science  can  largely  be  attributed  to  advances  in
            computer technology and processing speed, low cost storage of data, and the
            massive availability of data from the Internet and other sources. The access to
            big  data  and  the  advances  in  computer  technology  make  possible  the
            renewed  application  of  machine  learning  and  statistical  techniques  on
            problems, reporting huge successes in a wide range of applications. One of
            the subfields of data science is statistics, a branch of mathematics dealing with
            the collection, analysis and interpretation of data. Statistics have established
            itself  firmly  as  an  academic  discipline  and  has  been  in  existence  since  the
            eighteenth  century.  Because  of  the  big  data  explosion,  a  number  of  the
            classical statistical approaches that perform reasonably well for small datasets
            fail  when  dealing  with  huge  datasets.  Despite  this,  many  more  recently
            developed statistical techniques are used successfully in a big data context.
            Examples are logistic regression, cluster analysis, and decision trees. Machine
            learning and artificial intelligence are relatively new subfields of data science


                                                               248 | I S I   W S C   2 0 1 9
   254   255   256   257   258   259   260   261   262   263   264