Page 280 - Contributed Paper Session (CPS) - Volume 7
P. 280

CPS2099 Takatsugu Yoshioka et al.
               with  MCA  (Mitsuhiro  and  Yadohisa,  2015),  and  Cluster  Correspondence
               Analysis (van de Velden et al., 2017).
                   From the aspect of categorical analysis, it is better to deal with not only
               single nominal variables but also multiple nominal, ordinal, and combination
               of  measurement  level  including  numeric  variables.  To  do  this,  an  optimal
               scaling by Alternating Least Squares (ALS) can be utilized. So-called Nonlinear
               PCA  (NLPCA),  which  is carried  out by algorithm PRINCIPALS  (Young  et al.,
               1978)  or  PRINCALS  (Gifi,  1990),  is  one  of  possibilities  for  effective
               interpretation  of  the  variables  as  well  as  the  relationships  between
               objects/clusters in the reduced subspace (GROUALS by Van Buuren and Heiser
               (1989)  has  the  same  concept  for  quantification).  Considering  RKM  which
               includes PCA as a dimension reduction procedure and provides information
               to interpret the configuration of principal components scores of objects and
               the loading of each variable, NLPCA can be used as a dimension reduction
               method in RKM for data including categorical variables.
                   Thus,  we  propose  a  simultaneous  analysis  of  k-means  clustering  and
               NLPCA (we refer this RKM with NLPCA) to find clustering objects in a low-
               dimensional subspace with category quantifications.

               2.  Methodology
                   We here introduce ordinary RKM and NLPCA briefly based on De Soete
               and Carroll (1994) and Mori et al. (2017), respectively, before proposing RKM
               with NLPCA for categorical data / mixed measurement level data.

               2.1 Ordinary reduced k-means analysis
                   Ordinary RKM (for numerical data) is as follows.
                   Let X be the × centered data matrix, where  denotes the number of
               objects and  the number of variables,  be the number of clusters,  be the
               number of components (generally, ≥+1), U be the × membership matrix,
               and A be the × loadings matrix, Y= XA is the × object scores (component
               scores) matrix. RKM looks for centroids in a low-dimensional subspace that
               minimize  the  distance  of  the  data  points  from  the  centroids.  The  RKM
               minimizes the following loss function
                                                                  2
                                                                T
                                      (, , ) = ‖ −  ‖ ,          (1)
                   where F is the × matrix collecting centroids. We can confirm which of 
               clusters each object belongs to (from the estimated U), where each centroid
               of    clusters  is  (from  the  estimated  F),  which  direction  of  loading  of  each
               variable is (from the estimated A), and others in the  dimensional subspace.
               PCA,  estimating  U,  and  estimating  F  are  alternately  executed  until
               convergence.



                                                                  267 | I S I   W S C   2 0 1 9
   275   276   277   278   279   280   281   282   283   284   285