Page 82 - Special Topic Session (STS) - Volume 3
P. 82

STS515 Jeremiah D. D. et al.
                  requisites on mathematics and statistics by looking at some DS course content
                  at a higher level within the undergraduate programme
                     As explained in Section 3, Machine Learning is a core subject within a DS
                  programme  and  serves  a  good  example  course  for  us  to  examine  what
                  particular prerequisites and core components are necessary for its successful
                  delivery.
                     A Machine Learning course can be taught anywhere in the latter half of a
                  DS programe. In the New Zealand or Australia context, it can be second-year
                  or third-year “paper”. Regardless, we look at what mathematical and statistical
                  prerequisites may be necessary to enable effective teaching or learning of key
                  Machine Learning concepts and algorithms.
                     Using the famous “top-10 data mining algorithms” (Wu et al., 2008) as a
                  starting point, we examine the relevant preliminary subject studies as required
                  by each of the algorithms. In 2006, ACM KDD Innovation Award and IEEE ICDM
                  Research Contributions Award winners were asked to nominate key algorithms
                  across all fields of data mining and machine learning, and the nominations
                  were  then  voted  by  hundreds  of  the  ICDM’06/SDM’06/KDD’06  Technical
                  Programme Committee members, resulted in the top-10 algorithms listed in
                  Table 3. Here for each algorithm, the relevant background mathematics or
                  statistics knowledge as required is estimated, and matched to four generic
                  courses with code and content outlined as follows:
                  - MATH100: Entry-level algebra and calculus
                  - MATH200: Linear algebra, discrete mathmatics, optimization
                  - STAT100: Basic statistics such as probabilities and tests
                  - STAT200: Statistical inference
                     The ticks in Table 3 are made in a rough estimation of a normal delivery of
                  the algorithm. For instance, for the k-nearest neighbour (k-NN) algorithm its
                  algorithmic  operation  is  introduced  and  relvance  to  density  estimation  is
                  hinted, but the connection to EM and Bayesian inference is not necessarily
                  discussed (which then would require STAT200). Also, if a “200” option is ticked
                  for an algorithm the “100” option will be omitted.
                     As seen from Table 3, it seems that the top-10 algoirthms can be delivered
                  effectively without too much mathematical or statistical requisites.
                  On the other hand, DS is a discipline that undergoes rapid advances. To reflect
                  the landscape of R&D a decade latter than the top-10, we may have a new list
                  of key algorithms as given by Table 4. Clearly the algorithms have become
                  more advanced, bearning complexities that require deeper mathematical or
                  statistical  understanding.  Hence  we  come  to  the  conclusion  that  both
                  MATH200 and STAT200 are indispensable cores of a proper DS programme
                  (that teaches Machine Learning effectively).
                     Arguably, we can adopt a similar approach to map out the core requisites
                  of DS programmes by looking at other computing and statistics courses. This

                                                                      71 | I S I   W S C   2 0 1 9
   77   78   79   80   81   82   83   84   85   86   87