Page 227 - Contributed Paper Session (CPS) - Volume 2
P. 227

CPS1844 Reza M.


                             Testing the equality of high dimensional
                                           distributions
                                           Reza Modarres
                 Department of Statistics, the George Washington University, Washington, DC., USA

            Abstract
            We consider two groups of observations in Rd and present a simultaneous
            plot  of  the  empirical  cumulative  distribution  functions  of  the  within  and
            between interpoint distances (IPDs) to visualize and examine the equality of
            the underlying distribution functions of the observations. We provide several
            examples to illustrate how such plots can be utilized to envision and canvass
            the relationship between the two distributions under location shift, scale and
            shape  changes.  We  extend  the  simultaneous  plots  to  compare  k  >  2
            distributions.

            Keywords
            Graphical Comparison; Interpoint Distance; High Dimension; Homogeneity

            1.  Introduction
                Cleveland, Kleiner, and Tukey (1983) state "there is no statistical tool that
            is  as  powerful  as  a  well-  chosen  graph".  The  purpose  of  this  paper  is  to
            introduce a method for the visualization of high dimensional data sets, and
            provide  a  basis  for  their  comparison.  Many  methods  of  displaying  data  in
            higher  dimensions  have  been  suggested.  Andrews  plot  (Andrews,  1992)  is
            constructed using a Fourier transformation of the multivariate data. While it
            can represent data many dimensions the display depends on the order of the
            observations. High-dimensional data are frequently transformed onto a plane,
            whose  coordinate  axes  are  formed  by  the  first  two  principal  components.
            However, one needs to estimate the covariance matrix of the observations,
            which is singular when the number of variables is larger than group sample
            sizes  and  the  method  rarely  discriminate  between  patterns  from  different
            groups.

                Suppose  = { }       = { }      represents i.i.d. observation vectors in ℝ 
                                =1     =1
            from distribution functions (DFs) G1 and G2, respectively. We are interested in
            a visual aid to examine the null hypothesis  ∶  =   against  ∶  ≠  .
                                                            1
                                                                                1
                                                                                      2
                                                       0
                                                                 2
                                                                            
            Both parametric and non-parametric tests have been proposed to investigate
             . Let ||X|| = (X’X) 1/2  be the Euclidean norm of X. Consider i.i.d. vectors Xi and
              0
                                                                        
            Xj in ℝ that are taken from G1 and i.i.d. vectors Yi and Yj in ℝ that are taken
                   
            from G2. The IPD between Xi and Xj is given by  () = || −  ||, the IPD
                                                                        
                                                                             
            between Yi and Yj is given by  () = || −  || and the IPD between Xi and Yj
                                                        
                                                   
                                                               216 | I S I   W S C   2 0 1 9
   222   223   224   225   226   227   228   229   230   231   232