Page 336 - Invited Paper Session (IPS) - Volume 2
P. 336
IPS273 Tomoki Tokuda et al.
Multiple co-clustering with heterogenous
marginal distributions and its application
to identify subtypes of depressive disorder
1
2
Tomoki Tokuda , Junichiro Yoshimoto , Yu Shimizu , Kenji Doya 1
1,2
1 Okinawa Institute of Science and Technology Graduate University, JAPAN
2 Nara Institute of Science and Technology, JAPAN
Abstract
With the advent of sophisticated data acquisition methods, huge amounts of
data have become available. Cluster analysis is a powerful data mining tool to
reveal the underlying heterogeneous structure of objects in data. Recently, co-
clustering method gains much attention for its attempt to reveal relationships
between object and feature, hence capturing a possible interplay between
them. However, in a big dataset, multiple cluster structures may exist, where
cluster solutions differ depending on the features that one focusses on.
Furthermore, the marginal distribution of feature that characterizes a cluster
may be heterogeneous, e.g., Gaussian, Poisson or multinomial. To cope with
these challenges in big data analysis, we developed a novel multiple co-
clustering method. Our method is based on nonparametric Bayesian mixture
models in which features are optimally partitioned for each cluster solution.
This feature partition works as feature selection for a particular cluster solution,
screening out irrelevant features. For mixture components, we assume
Gaussian, Poisson or multinomial distributions (pre-specified, but the mixing
of these different types in data is allowed). We present the theoretical
foundation of our method, and show how our method works on real data. The
demonstration data is based on our recent study on identification of subtypes
of depressive disorder using high-dimensional data of different modalities
such as functional Magnetic Resonance Imaging (fMRI), clinical questionnaire
scores, and genetic polymorphism.
Keywords
Multi-view clustering; Mixture models; Feature selection; MRI
1. Introduction
We consider a clustering problem for a data matrix that consists of objects
(or subjects) in rows and features (variables, or attributes) in columns.
Clustering objects based on the data matrix is a basic data mining approach,
which groups objects with similar patterns of distribution. As an extension of
conventional clustering, a co-clustering model has been proposed, which
captures not only object cluster structure, but also feature cluster structure
(Lazzeroni & Owen, 2002; Gu & Zhou, 2009; Madeira & Oliveira, 2004). In the
present paper, we focus on a specific type of co-clustering, so called ‘check
323 | I S I W S C 2 0 1 9