Page 347 - Invited Paper Session (IPS) - Volume 2
P. 347

IPS273 Charlotte L. et al.
               In  this  paper,  we  study  the  connection  between  entropy-regularized
            optimal transport and rank-one matrix factorization and show how they allow
            to  obtain  the  co-clustering  partitions  with  an  automatic  detection  of  the
            number of rows and columns clusters. Our interest for these methods is due
            to the emergence of optimal transport in machine learning on one hand and
            by the longstanding presence of matrix factorization on the other. We would
            like to underline that our paper studies and presents the connection between
            the above mentioned methods in an exploratory fashion: our foremost goal is
            to  illustrate  the  intuition  and  provide  a  new  point  of  view  allowing  to
            understand the link that exists between these two learning approaches, that
            seem completely different at first sight. The rest of this paper is organized as
            follows. In Section 2, we present the preliminary knowledge related to optimal
            transport  problem  and  its  entropic  regularized  version.  In  Section  3,  we
            explore  the  relationship  between  regularized  optimal  transport  and  matrix
            factorization for the co-clustering problem. Finally, we conclude our paper and
            give several challenging future research directions that can be derived from it.

            2.  Optimal Transportation of Discrete Measures
                Optimal  transport  considers  the  problem  of  finding  a  mapping
            transporting one probability measure to another one in a way that minimizes
            the transportation cost. Hereafter, we consider the formulation of Kantorovich
            (Kantorovich, 1942) for discrete measure, that consists in finding a coupling
            matrix  ∈ ℝ ×  such that   is defined as the fraction of mass transported
                                        ,
            from the  − ℎ bin of the source distribution α (i.e. xi) to the  − ℎ bin of the
            target distribution (. .   ), belonging to the simplex of size− and size−,
                                      
            respectively. The set of admissible couplings writes:

                                                             ⊤
                             Π(, ) ≔ { ∈ ℝ ×  ∶   = ,  1   = },
                                             +
                                                      

               where   is the size− vector of ones. To find an optimal coupling among
                       
            the admissible ones, one has to define a cost matrix  ∈ ℝ ×  that models
                                                                       +
            the cost of moving source bin  to target bin  represented in our case by 
                                                                                      
            and  , respectively. For instance, if  =   a  natural choice for  is the
                                                        1
                                                       
                                                  x
                  
                                                            2
            squared  Euclidean  distance,  i.e.   = ‖x − y ‖ .  In  the  following,  and  for
                                                          
                                                     
                                               
                                                            2
            complexity reasons, we focus on the proposition of Cuturi (2013) that consists
            in adding a regularization to promote couplings that are close to the trivial
            coupling  ∈  Π(, ) in the sense of the Kullback-Liebler divergence. In this
                        ⊤
            case, optimal transportation problem writes:

              Note that when this is not the case, one may use the Gromov-Wasserstein distance defined
            1
            for metric-measure spaces of different size. We refer the reader to Peyre et al (2016) for
            further details.
                                                               334 | I S I   W S C   2 0 1 9
   342   343   344   345   346   347   348   349   350   351   352