Page 120 - Contributed Paper Session (CPS) - Volume 1
P. 120

CPS1196 Song X. et al.
                  economic growth path, the differences among these cities are arguably no
                  smaller than the differences between Dubai, UAE and San Jose, CA. Examining
                  the  variations  in  China's  cities'  economic  development  paths  would  shed
                  additional  light  on  how  to  induce  and  facilitate  urbanization  process  in
                  developing countries. Using the weights of secondary and tertiary industries
                  as input variables, the time-series clustering has grouped the 35 major Chinese
                  cities into five categories. Cities in each of  the five categories have shown
                  interesting similarities in economic growth paths, in spite of some seemingly
                  significant disparities. On the other hand, cities in different categories have
                  distinct  growth  patterns.  This  research  shows  the  potential  of  applying
                  unsupervised  machine  learning  techniques  in  the  field  of  development
                  economics.

                  2.  Methodology
                      Clustering  is  a  family  of  machine  learning  algorithms  that  seek  to
                  categorize unlabeled data objects into a number of groups, in such a way that
                  objects  in  the  same  group  are  similar  and  objects  in  different  groups  are
                  distinct (Jain, Murty and Flynn, 1999). In contrast to classification algorithms,
                  which assign data objects to predefined groups and hence necessitate labeled
                  training data, clustering algorithms do not require ex ante knowledge on the
                  groups.
                      Selecting distance measure to evaluate the similarity of data objects is
                  critical  to  a  clustering  algorithm  and  its  results.  The  Euclidean  distance  is
                  commonly  used  in  clustering  as  the  similarity  measure.  However,  it  is  not
                  suitable  for  clustering  time-series  data.  Extant  literature  have  proposed  a
                  number  of  time-series  similarity  measures,  with  each  of  them  being
                  appropriate for different applications (Aghabozorgi, Shirkhorshidi and Wah,
                  2015).
                      In this research, we use dynamic time warping (DTW, [Jeong, Jeong, and
                  Omitaomu, 2011]) distance for clustering the time-series economic data of 35
                  Chinese cities. DTW measures the distance between two data series based on
                  their shapes. The method was first applied in speech processing problems by
                  Berndt and Clifford (1994), and soon become popular as a time series distance
                  measure (Izakian, Pedrycz, and Jamal, 2015).
                      DTW computes the distance between two time series T1 and T2 of length
                  m and n using dynamic programming as follows:

                  1.    ( ,  ) =  0,   =  = 0
                               2
                            1
                  2.    ( ,  )  =  ∞,   =  ≠ 0
                            1
                               2
                  3.    ( ,  )  =  (11, 21) + , ℎ
                               2
                            1


                                                                     109 | I S I   W S C   2 0 1 9
   115   116   117   118   119   120   121   122   123   124   125