Page 235 - Special Topic Session (STS)

Page 235 - Special Topic Session (STS) - Volume 1

P. 235

STS426 Tanuka C.
For regular models the integrated likelihood can be approximated
simply by the BIC (Bayesian Information Criterion). Using BIC will add
a penalty to the log-likelihood based on the number of parameters,
and has shown good performance in a number of applications. (?; ?;
?). BIC can be calculated as follows:

∗
2 log (|ℳ ) ≈ 2 log (| , ℳ ) − log ≡ (13)

∗
where, log (| , ℳ ) is the maximized likelihood for the model and

data and is the number of independent parameters to be

estimated from model ℳ (?).

Finally, we can adopt the following strategy to combine all of the
methods discussed so far to select the optimal model:
 Select a maximum number of components to consider for our
mixture model. Let us call it Gmax.
 Estimate the parameters via the EM and MAP estimate method
for each parameterization and each number of components up to
Gmax.
 Compute BIC for the mixture likelihood taking the parameter
estimates from the EM for up to Gmax clusters.
 Select the model (parameterization/number of mixture
components) having the maximum BIC.

3.3 Dimension reduction for visualization
After performing cluster analysis to a group of data it is usually desired
to check the distinctness of the clusters created. Popular measure for
cluster validity e.g., Silhouette Width (?) utilizes Euclidean Distances or any
standardized metric for checking the validity of K − Means clustering. But
the distance to be used in case of clusters arising from GMMBC and other
clustering methods are not clearly delineated. ? proposed a methodology
to reduce the dimensionality of data so that it can be projected to a
subspace of 2 or 3 dimensions and thus we will have a convenient visual
representation of the clusters created from a finite mixture of Gaussian
densities. Information on the dimension reduced subspace is taken from
the various group-specific measures such as, group means and depending
on the estimated mixture model: variation on group covariances. The
proposed method aims to reduce the dimensionality by identifying a set of
linear combinations - called Directions - ordered by importance as
quantified by the associated eigenvalues of the original features which
capture most of the cluster structure contained in the data. After
performing all these, observations may then be projected on a dimension
reduced subspace. This will facilitate various summary plots which will help
us to visualize the clustering structure. The method uses the Gaussianity of

224 | I S I W S C 2 0 1 9

230 231 232 233 234 235 236 237 238 239 240