Page 120 - Contributed Paper Session (CPS) - Volume 1
P. 120
CPS1196 Song X. et al.
economic growth path, the differences among these cities are arguably no
smaller than the differences between Dubai, UAE and San Jose, CA. Examining
the variations in China's cities' economic development paths would shed
additional light on how to induce and facilitate urbanization process in
developing countries. Using the weights of secondary and tertiary industries
as input variables, the time-series clustering has grouped the 35 major Chinese
cities into five categories. Cities in each of the five categories have shown
interesting similarities in economic growth paths, in spite of some seemingly
significant disparities. On the other hand, cities in different categories have
distinct growth patterns. This research shows the potential of applying
unsupervised machine learning techniques in the field of development
economics.
2. Methodology
Clustering is a family of machine learning algorithms that seek to
categorize unlabeled data objects into a number of groups, in such a way that
objects in the same group are similar and objects in different groups are
distinct (Jain, Murty and Flynn, 1999). In contrast to classification algorithms,
which assign data objects to predefined groups and hence necessitate labeled
training data, clustering algorithms do not require ex ante knowledge on the
groups.
Selecting distance measure to evaluate the similarity of data objects is
critical to a clustering algorithm and its results. The Euclidean distance is
commonly used in clustering as the similarity measure. However, it is not
suitable for clustering time-series data. Extant literature have proposed a
number of time-series similarity measures, with each of them being
appropriate for different applications (Aghabozorgi, Shirkhorshidi and Wah,
2015).
In this research, we use dynamic time warping (DTW, [Jeong, Jeong, and
Omitaomu, 2011]) distance for clustering the time-series economic data of 35
Chinese cities. DTW measures the distance between two data series based on
their shapes. The method was first applied in speech processing problems by
Berndt and Clifford (1994), and soon become popular as a time series distance
measure (Izakian, Pedrycz, and Jamal, 2015).
DTW computes the distance between two time series T1 and T2 of length
m and n using dynamic programming as follows:
1. ( , ) = 0, = = 0
2
1
2. ( , ) = ∞, = ≠ 0
1
2
3. ( , ) = (11, 21) + , ℎ
2
1
109 | I S I W S C 2 0 1 9