Page 231 - Special Topic Session (STS)

Page 231 - Special Topic Session (STS) - Volume 1

P. 231

STS426 Tanuka C.
where, ∥ ∥= √ ; the norm of , ( , . . . , ) are the data points and

1
( , . . . , ) are the cluster centers for the clusters: , . . . , .

2
∑ ∑ ∥ − ∥ is the Euclidean Distance between a data point

=1
=1
ℎ
and cluster center of the cluster . This is basically an indicator

of the distance of the data points from their respective cluster centers.
Finally, the algorithm has the following steps:

3.2 The Mixture-Model Based Clustering Technique
K − Means clustering is an iterative relocation method which minimizes
the intra-cluster variance. Model Based Clustering (MBC) is also an iterative
method but unlike K −Means, it has the provision for variability and
structure
of the data. In finite mixture model based clustering, each of the
component probability distribution corresponds to a cluster. The usual
questions in applied cluster analysis, i.e., choice of appropriate clustering
method and determination of number of clusters, can be reformulated as a
Statistical Model Selection problem where models that differ in number of
components and/or in component distribution can be compared. Outliers
as well can conveniently modeled by adding one or more component(s)
representing a different distribution for the outlying data (?).
As already noted, K − Means assumes homogeneous and spherical
groups/clusters. This can be viewed as a procedure which approximately
maximizes the multivariate normal classification likelihood when the
covariance matrix is equal for each of the mixing component probability
distributions and is proportional to the identity matrix. On the other hand,
MBC can tackle the problem of overlapping and non-spherical clusters
having different covariance structures. (?).
Suppose we have the data: = { , . . . , } where is a d-dimensional

vector. Now, for a given number of components of length G, assume the
points are generated in an (independently and identically distributed)
manner from the finite-mixture model:

(|) = ∑ (| ) (4)

=
where, (| ) represents the density of the ℎ group/cluster

parameterized by . : = Pr ( ∈ is called the mixing

proportion/weight where ∑ = 1. The complete set of parameters for

=1
a mixture-model with G components is:

= { , . . . , , , . . . , }

Most often and throughout the rest of this work, is taken to be

Multivariate Normal (Gaussian) distribution (| , ∑ ), parameterized by

220 | I S I W S C 2 0 1 9

226 227 228 229 230 231 232 233 234 235 236