Page 279 - Contributed Paper Session (CPS) - Volume 7
P. 279
CPS2099 Takatsugu Yoshioka et al.
Reduced k-Means with Nonlinear Principal
Component Analysis
2
Takatsugu Yoshioka ,Masahiro Kuroda ,Yuichi Mori
1
2
1 Graduate School of Informatics, Okayama University of Science, Japan
2 Department of Management, Okayama University of Science, Japan
Abstract
Reduced k-means analysis (RKM) is a useful method for clustering objects in
a low-dimensional subspace by conducting k-means clustering and dimension
reduction simultaneously. Here RKM for categorical data and mixed
measurement level data is considered. Although there have been several
methods for categorical data based on RKM, a method of RKM which
combines nonlinear principal component analysis (NLPCA) as dimension
reduction with k-means clustering is proposed to deal with not only nominal
variables but also ordinal and combination of categorical and numerical ones,
and to provide the effective information for interpretation of the variables as
well as the relationships between objects/clusters in the reduced subspace. A
couple of numerical experiments demonstrate the performance of the
proposed method.
Keywords
Dimension reduction; Clustering, Alternating least squares optimal scaling;
Categorical data; Simultaneous estimation
1. Introduction
Reduced k-means analysis (RKM) is a method for clustering objects in a
low-dimensional subspace, hat is, a simultaneous analysis in which k-means
clustering and dimension reduction are conducted at the same time to obtain
clustering of objects and low-dimensional subspace reflecting the
clusterstructure (De Soete and Carroll, 1994).
The original RKM is developed for continuous data and there are several
approaches/extensions based on RKM, e.g., Vichi and Kiers (2001),
Timmerman et al., (2010), Vidal (2011), Yamamoto and Hwang (2014), which
combine dimension reduction such as principal component analysis (PCA) with
k-means clustering. When categorical data is analyzed by RKM, an appropriate
quantification of categorical variables is necessary in the analysis. In such case,
multiple correspondence analysis (MCA) is often used to quantify categorical
variables. There are a number of studies combining MCA with k-means to find
clustering objects consisting of categorical variables in a low-dimensional
subspace with category quantifications, e.g., MCA k -means (Hwang et al.,
2006), iFCB (Iodice D'Enza and Palumbo, 2013), Reduced k-means clustering
266 | I S I W S C 2 0 1 9