Page 349 - Invited Paper Session (IPS) - Volume 2
P. 349
IPS273 Charlotte L. et al.
and columns marginal distributions. For more details, see (Laclau et al,
2017).
3.2 From Optimal Transport to Matrix Factorization
Now, let us further consider the objective function related to the
optimal transportation problem with Kullback-Leibler regularization
presented in above and defined as:
ℒ(; ; , ) = 〈, 〉 + (‖ ).
⊤
The minimization of ℒ is usually performed w.r.t. the coupling with
a and b being fixed vectors from size- and size- simplexes, respectively.
However, for the co-clustering problem introduced above, we are
interested in finding the scaling vectors u and v for a given data matrix A
directly and thus propose to consider the following minimization problem:
(, ) ≔ arg min ℒ (; ; , ),
(,)
where contrary to the original problem the minimization is performed
over u and v. We note that this problem simply amounts to the
minimization of KL(‖ ) with = that does not necessarily represent
⊤
some joint distribution. Instead, this problem aims to find two vectors u
and v that have sufficient entropy or small enough mutual information
w.r.t. the matrix of interactions A. In the general case of a matrix A ∈ ℝ ×
with potentially missing values, the calculation of u and v amounts to
solving the following matrix factorization problem:
,
,
min ∑ , log ( ) − , + .
(,)∈ℝ ×ℝ
,=1
,
The connection between optimal transport and matrix factorization
can also be shown using a different point of view. When α and β are
defined as uniformly weighted sums of Diracs, the Kullback-Leibler
divergence term in (3) becomes:
⊤
KL(‖ ) = KL(‖ /).
⊤
One may further note that as λ grows, the solution of the regularized
optimal transport becomes closer to the uniform distribution and thus its
factorization becomes:
⋆
⊤
/ ~ (, ) = diag() −M/ diag()
→∞
(1/) (1/) ≃ −M/ .
336 | I S I W S C 2 0 1 9