Page 336 - Contributed Paper Session (CPS) - Volume 6
P. 336
CPS1950 Paolo G. et al.
the associated weight is, the more the variable plays a relevant role. If the
weight of a variable is equal to zero, then the variable is discarded. There are
various ways to give weights to the variables. A common one is based on
Principal Component Analysis (PCA), i.e., by considering the component
loadings. In this case, the principal components span a low dimensional space
of order Q (< J) where the observation units are projected. The partition is
carried out by clustering the observation units in terms of their coordinates on
such a low-dimensional space, i.e., in terms of the component scores. For this
reason, we refer to as subspace clustering.
In the naïve approach to subspace clustering, the data reduction and the
clustering steps are done sequentially. In other words, firstly, PCA is applied to
the data, then the clustering method is run on the resulting component scores.
Such an approach is usually known as tandem analysis (Arabie & Hubert,
1994). Although it is very intuitive, its use is not recommended because the
principal components are not optimal in the clustering sense. In fact, as is well-
known, they maximize the total sum of squares and therefore may lead to a
low-dimensional configuration of the observation units such that the
taxonomy is obscured. For more details, the interested reader may refer to, for
instance, De Sarbo et al. (1990) and De Soete & Carroll (1994).
In order to address the clustering problem in a reduced subspace
simultaneously, at least two proposals can be used. These are the Reduced K-
means (RKM) analysis suggested by De Soete & Carroll (1994) and the
Factorial K-means (FKM) analysis suggested by Vichi & Kiers (2001). Both
methods detect a partition of the observation units in K clusters by assuming
that centroids lie in a subspace of variables. Although they are based on the
same assumption, as we shall see, they present distinctive features.
In this paper, we are going to propose a new clustering method in a
reduced subspace exploiting the potentialities of RKM and FKM. For this
purpose, a linear convex combination of the RKM and FKM loss functions will
be used. Furthermore, in order to enlarge the applicability of our proposal, the
clustering problem is approached from the fuzzy point of view (Zadeh, 1965).
In contrast with the standard approach where the observation units either
belong or not to the clusters and every observation unit can be assigned to
one and only one cluster, the fuzzy approach allows to assign the observation
units to the clusters with the so-called fuzzy membership degrees ranging in
the interval [0, 1], where 0 means complete non-membership and 1 complete
membership, and such that, for each observation unit, the sum of the fuzzy
membership degrees is equal to one.
The paper is organized as follows. In the next section, RKM and FKM are
recalled and the new proposal is introduced. In Section 3 the results of the
application of the new clustering procedure to real data are reported. Some
final remarks in Section 4 conclude the paper.
325 | I S I W S C 2 0 1 9