Page 280 - Contributed Paper Session (CPS) - Volume 7
P. 280
CPS2099 Takatsugu Yoshioka et al.
with MCA (Mitsuhiro and Yadohisa, 2015), and Cluster Correspondence
Analysis (van de Velden et al., 2017).
From the aspect of categorical analysis, it is better to deal with not only
single nominal variables but also multiple nominal, ordinal, and combination
of measurement level including numeric variables. To do this, an optimal
scaling by Alternating Least Squares (ALS) can be utilized. So-called Nonlinear
PCA (NLPCA), which is carried out by algorithm PRINCIPALS (Young et al.,
1978) or PRINCALS (Gifi, 1990), is one of possibilities for effective
interpretation of the variables as well as the relationships between
objects/clusters in the reduced subspace (GROUALS by Van Buuren and Heiser
(1989) has the same concept for quantification). Considering RKM which
includes PCA as a dimension reduction procedure and provides information
to interpret the configuration of principal components scores of objects and
the loading of each variable, NLPCA can be used as a dimension reduction
method in RKM for data including categorical variables.
Thus, we propose a simultaneous analysis of k-means clustering and
NLPCA (we refer this RKM with NLPCA) to find clustering objects in a low-
dimensional subspace with category quantifications.
2. Methodology
We here introduce ordinary RKM and NLPCA briefly based on De Soete
and Carroll (1994) and Mori et al. (2017), respectively, before proposing RKM
with NLPCA for categorical data / mixed measurement level data.
2.1 Ordinary reduced k-means analysis
Ordinary RKM (for numerical data) is as follows.
Let X be the × centered data matrix, where denotes the number of
objects and the number of variables, be the number of clusters, be the
number of components (generally, ≥+1), U be the × membership matrix,
and A be the × loadings matrix, Y= XA is the × object scores (component
scores) matrix. RKM looks for centroids in a low-dimensional subspace that
minimize the distance of the data points from the centroids. The RKM
minimizes the following loss function
2
T
(, , ) = ‖ − ‖ , (1)
where F is the × matrix collecting centroids. We can confirm which of
clusters each object belongs to (from the estimated U), where each centroid
of clusters is (from the estimated F), which direction of loading of each
variable is (from the estimated A), and others in the dimensional subspace.
PCA, estimating U, and estimating F are alternately executed until
convergence.
267 | I S I W S C 2 0 1 9