Page 227 - Contributed Paper Session (CPS) - Volume 2
P. 227
CPS1844 Reza M.
Testing the equality of high dimensional
distributions
Reza Modarres
Department of Statistics, the George Washington University, Washington, DC., USA
Abstract
We consider two groups of observations in Rd and present a simultaneous
plot of the empirical cumulative distribution functions of the within and
between interpoint distances (IPDs) to visualize and examine the equality of
the underlying distribution functions of the observations. We provide several
examples to illustrate how such plots can be utilized to envision and canvass
the relationship between the two distributions under location shift, scale and
shape changes. We extend the simultaneous plots to compare k > 2
distributions.
Keywords
Graphical Comparison; Interpoint Distance; High Dimension; Homogeneity
1. Introduction
Cleveland, Kleiner, and Tukey (1983) state "there is no statistical tool that
is as powerful as a well- chosen graph". The purpose of this paper is to
introduce a method for the visualization of high dimensional data sets, and
provide a basis for their comparison. Many methods of displaying data in
higher dimensions have been suggested. Andrews plot (Andrews, 1992) is
constructed using a Fourier transformation of the multivariate data. While it
can represent data many dimensions the display depends on the order of the
observations. High-dimensional data are frequently transformed onto a plane,
whose coordinate axes are formed by the first two principal components.
However, one needs to estimate the covariance matrix of the observations,
which is singular when the number of variables is larger than group sample
sizes and the method rarely discriminate between patterns from different
groups.
Suppose = { } = { } represents i.i.d. observation vectors in ℝ
=1 =1
from distribution functions (DFs) G1 and G2, respectively. We are interested in
a visual aid to examine the null hypothesis ∶ = against ∶ ≠ .
1
1
2
0
2
Both parametric and non-parametric tests have been proposed to investigate
. Let ||X|| = (X’X) 1/2 be the Euclidean norm of X. Consider i.i.d. vectors Xi and
0
Xj in ℝ that are taken from G1 and i.i.d. vectors Yi and Yj in ℝ that are taken
from G2. The IPD between Xi and Xj is given by () = || − ||, the IPD
between Yi and Yj is given by () = || − || and the IPD between Xi and Yj
216 | I S I W S C 2 0 1 9