Page 347 - Invited Paper Session (IPS)

Page 347 - Invited Paper Session (IPS) - Volume 2

P. 347

IPS273 Charlotte L. et al.
In this paper, we study the connection between entropy-regularized
optimal transport and rank-one matrix factorization and show how they allow
to obtain the co-clustering partitions with an automatic detection of the
number of rows and columns clusters. Our interest for these methods is due
to the emergence of optimal transport in machine learning on one hand and
by the longstanding presence of matrix factorization on the other. We would
like to underline that our paper studies and presents the connection between
the above mentioned methods in an exploratory fashion: our foremost goal is
to illustrate the intuition and provide a new point of view allowing to
understand the link that exists between these two learning approaches, that
seem completely different at first sight. The rest of this paper is organized as
follows. In Section 2, we present the preliminary knowledge related to optimal
transport problem and its entropic regularized version. In Section 3, we
explore the relationship between regularized optimal transport and matrix
factorization for the co-clustering problem. Finally, we conclude our paper and
give several challenging future research directions that can be derived from it.

2. Optimal Transportation of Discrete Measures
Optimal transport considers the problem of finding a mapping
transporting one probability measure to another one in a way that minimizes
the transportation cost. Hereafter, we consider the formulation of Kantorovich
(Kantorovich, 1942) for discrete measure, that consists in finding a coupling
matrix ∈ ℝ × such that is defined as the fraction of mass transported
,
from the − ℎ bin of the source distribution α (i.e. xi) to the − ℎ bin of the
target distribution (. . ), belonging to the simplex of size− and size−,

respectively. The set of admissible couplings writes:

⊤
Π(, ) ≔ { ∈ ℝ × ∶ = , 1 = },
+

where is the size− vector of ones. To find an optimal coupling among

the admissible ones, one has to define a cost matrix ∈ ℝ × that models
+
the cost of moving source bin to target bin represented in our case by

and , respectively. For instance, if = a natural choice for is the
1

2
squared Euclidean distance, i.e. = ‖x − y ‖ . In the following, and for

2
complexity reasons, we focus on the proposition of Cuturi (2013) that consists
in adding a regularization to promote couplings that are close to the trivial
coupling ∈ Π(, ) in the sense of the Kullback-Liebler divergence. In this
⊤
case, optimal transportation problem writes:

Note that when this is not the case, one may use the Gromov-Wasserstein distance defined
1
for metric-measure spaces of different size. We refer the reader to Peyre et al (2016) for
further details.
334 | I S I W S C 2 0 1 9

342 343 344 345 346 347 348 349 350 351 352