Page 295 - Special Topic Session (STS) - Volume 1
P. 295

STS430 Eustasio D.B. et al.

                         Central limit theorem and bootstrap procedure
                         for Wasserstein’s variations with an application
                         to structural relationships between distributions
            Eustasio del Barrio , Paula Gordaliza , Hélène Lescornel , Jean-Michel Loubes
                                               1
                              1
                                                                                      2
                                                                 2
                                    1 IMUVA, Universidad de Valladolid
                                  2 Institut de mathématiques de Toulouse

            Abstract
            Wasserstein barycenters and variance-like criteria based on the Wasserstein
            distance are used in many problems to analyze the homogeneity of collections
            of  distributions  and  structural  relationships  between  the  observations.  We
            propose  the  estimation  of  the  quantiles  of  the  empirical  process
            ofWasserstein’s  variation  using  a  bootstrap  procedure.  We  then  use  these
            results for statistical inference on a distribution registration model for general
            deformation functions. The tests are based on the variance of the distributions
            with respect to their Wasserstein’s barycenters for which we prove central limit
            theorems, including bootstrap versions.

            Keywords
            Central Limit Theorem; goodness-of-fit; wasserstein distance

            1.  Introduction
                Analyzing the variability of large data sets is a difficult task when the inner
            geometry of the information conveyed by the observations is far from being
            Euclidean.  Indeed,  deformations  on  the  data  such  as  location-scale
            transformations  or  more  general  warping  procedures  preclude  the  use  of
            common  statistical  methods.  Looking  for  a  way  to  measure  structural
            relationships  within  data  is  of  high  importance.  Such  issues  arise  when
            considering  the  estimation  of  probability  measures  observed  with
            deformations; it is common, e.g., when considering gene expression.
                Over the last decade, there has been a large amount of work dealing with
            registrations  issues.  We  refer,  e.g.,  to  [3,  5,  29]  and  references  therein.
            However,  when  dealing  with  the  registration  of  warped  distributions,  the
            literature  is  scarce.  We  mention  here  the  method  provided  for  biological
            computational  issues  known  as  quantile  normalization  in  [10,  22]  and
            references  therein.  Recently,  using  optimal  transport  methodologies,
            comparisons  of  distributions  have  been  studied  using  a  notion  of  Fréchet
            mean for distributions as in [1] or a notion of depth as in [11].
                As a  natural frame for applications of a deformation model, consider J
            independent random samples of size n, where for each j∈ {1, … . }, the real-
            valued random variable Xj has distribution   and, for each  ∈ {1, … . , }, the
                                                       

                                                               284 | I S I   W S C   2 0 1 9
   290   291   292   293   294   295   296   297   298   299   300