Page 279 - Contributed Paper Session (CPS) - Volume 7
P. 279

CPS2099 Takatsugu Yoshioka et al.


                               Reduced k-Means with Nonlinear Principal
                                          Component Analysis
                                                                 2
                            Takatsugu Yoshioka ,Masahiro Kuroda ,Yuichi Mori
                                               1
                                                                             2
                        1  Graduate School of Informatics, Okayama University of Science, Japan
                         2  Department of Management, Okayama University of Science, Japan

               Abstract
               Reduced k-means analysis (RKM) is a useful method for clustering objects in
               a low-dimensional subspace by conducting k-means clustering and dimension
               reduction  simultaneously.  Here  RKM  for  categorical  data  and  mixed
               measurement  level  data  is  considered.  Although  there  have  been  several
               methods  for  categorical  data  based  on  RKM,  a  method  of  RKM  which
               combines  nonlinear  principal  component  analysis  (NLPCA)  as  dimension
               reduction with k-means clustering is proposed to deal with not only nominal
               variables but also ordinal and combination of categorical and numerical ones,
               and to provide the effective information for interpretation of the variables as
               well as the relationships between objects/clusters in the reduced subspace. A
               couple  of  numerical  experiments  demonstrate  the  performance  of  the
               proposed method.

               Keywords
               Dimension  reduction;  Clustering,  Alternating  least  squares  optimal  scaling;
               Categorical data; Simultaneous estimation

               1.  Introduction
                   Reduced k-means analysis (RKM) is a method for clustering objects in a
               low-dimensional subspace, hat is, a simultaneous analysis in which k-means
               clustering and dimension reduction are conducted at the same time to obtain
               clustering  of  objects  and  low-dimensional  subspace  reflecting  the
               clusterstructure (De Soete and Carroll, 1994).
                   The original RKM is developed for continuous data and there are several
               approaches/extensions  based  on  RKM,  e.g.,  Vichi  and  Kiers  (2001),
               Timmerman et al., (2010), Vidal (2011), Yamamoto and Hwang (2014), which
               combine dimension reduction such as principal component analysis (PCA) with
               k-means clustering. When categorical data is analyzed by RKM, an appropriate
               quantification of categorical variables is necessary in the analysis. In such case,
               multiple correspondence analysis (MCA) is often used to quantify categorical
               variables. There are a number of studies combining MCA with k-means to find
               clustering  objects  consisting  of  categorical  variables  in  a  low-dimensional
               subspace  with  category quantifications,  e.g., MCA  k  -means  (Hwang  et al.,
               2006), iFCB (Iodice D'Enza and Palumbo, 2013), Reduced k-means clustering

                                                                  266 | I S I   W S C   2 0 1 9
   274   275   276   277   278   279   280   281   282   283   284