Page 346 - Invited Paper Session (IPS) - Volume 2
P. 346

IPS273 Charlotte L. et al.



                               Rank-one matrix factorization for co-clustering:
                                      An optimal transport perspective
                                                               2
                                Charlotte Laclau , Franck Iutzeler , Ievgen Redko 1
                                               1
                  1 Univ Lyon, UJM-Saint-Etienne, CNRS, Institut d Optique Graduate School, Laboratoire Hubert
                                    Curien UMR 5516 F-42023, Saint-Etienne, France
                           2  Univ. Grenoble Alpes, CNRS, Grenoble INP, LJK 38000 Grenoble, France

                  Abstract
                  In  this  paper,  we  first  present  a  novel  approach  for  the  co-clustering  of
                  matrices rows/columns using entropy-regularized optimal transport. We show
                  how this technique is connected to several seemingly unrelated unsupervised
                  learning algorithms including matrix factorization. Using this observation as a
                  starting  point,  we  developed  two  novel  algorithmic  solutions  for  the  co-
                  clustering problem based on OT and rank-one matrix factorization. We believe
                  that our work provides a new point of view for several unsupervised learning
                  techniques  that  helps  to  gain  a  deeper  understanding  about  the  general
                  mechanisms of co-clustering.

                  Keywords
                  Co-clustering; Optimal Transport; Matrix Factorization

                  1.  Introduction
                     In  many  real-world  applications,  data  is  often  collected  in  form  of
                  observations  related  to  joint  interactions  between  two  types  of  entities.
                  Usually,  these  interactions  are  represented  as  a  matrix  whose  modes
                  correspond to the underlying entities and whose entries reflect the type or the
                  intensity  of  the  interactions  between  them.  Co-clustering  exploits  the
                  correlation between the rows and the columns of such matrices and allows to
                  identify local structures given by groups of rows which share similar patterns
                  for a specific subsets of columns. These methods are often categorized into
                  two  main  frameworks,  notably  probabilistic  and  metric-based  methods.
                  Despite  a  large  variety of  existing  approaches,  several  studies  proved  that
                  sometimes seemingly distinct methods actually optimize the same objective
                  function.  This  is  the  case,  for  instance,  for  k-means  and  EM  for  Gaussian
                  mixture models clustering where the former can be shown to be a special case
                  of the latter. On the other hand, different versions of non-negative matrix
                  factorization minimize an objective function similar to a constrained k-means
                  problem (Ding et al, 2010) and thus are intrinsically linked with probabilistic
                  models. All these connections are of great importance for scientific community
                  as  they  provide  fundamental  understanding  of  mechanisms  behind
                  unsupervised learning in general.

                                                                     333 | I S I   W S C   2 0 1 9
   341   342   343   344   345   346   347   348   349   350   351