Page 295 - Special Topic Session (STS) - Volume 1
P. 295
STS430 Eustasio D.B. et al.
Central limit theorem and bootstrap procedure
for Wasserstein’s variations with an application
to structural relationships between distributions
Eustasio del Barrio , Paula Gordaliza , Hélène Lescornel , Jean-Michel Loubes
1
1
2
2
1 IMUVA, Universidad de Valladolid
2 Institut de mathématiques de Toulouse
Abstract
Wasserstein barycenters and variance-like criteria based on the Wasserstein
distance are used in many problems to analyze the homogeneity of collections
of distributions and structural relationships between the observations. We
propose the estimation of the quantiles of the empirical process
ofWasserstein’s variation using a bootstrap procedure. We then use these
results for statistical inference on a distribution registration model for general
deformation functions. The tests are based on the variance of the distributions
with respect to their Wasserstein’s barycenters for which we prove central limit
theorems, including bootstrap versions.
Keywords
Central Limit Theorem; goodness-of-fit; wasserstein distance
1. Introduction
Analyzing the variability of large data sets is a difficult task when the inner
geometry of the information conveyed by the observations is far from being
Euclidean. Indeed, deformations on the data such as location-scale
transformations or more general warping procedures preclude the use of
common statistical methods. Looking for a way to measure structural
relationships within data is of high importance. Such issues arise when
considering the estimation of probability measures observed with
deformations; it is common, e.g., when considering gene expression.
Over the last decade, there has been a large amount of work dealing with
registrations issues. We refer, e.g., to [3, 5, 29] and references therein.
However, when dealing with the registration of warped distributions, the
literature is scarce. We mention here the method provided for biological
computational issues known as quantile normalization in [10, 22] and
references therein. Recently, using optimal transport methodologies,
comparisons of distributions have been studied using a notion of Fréchet
mean for distributions as in [1] or a notion of depth as in [11].
As a natural frame for applications of a deformation model, consider J
independent random samples of size n, where for each j∈ {1, … . }, the real-
valued random variable Xj has distribution and, for each ∈ {1, … . , }, the
284 | I S I W S C 2 0 1 9