Page 228 - Contributed Paper Session (CPS) - Volume 2
P. 228

CPS1844 Reza M.
                  is  given  by  () =  || −  ||.  We  will  call  ()  and  ()  the  within  and
                                              
                                          
                   ()  the between sample IPDs.
                  2.  Comparing Two Groups

                     Let  ,   and    denote the DFs of Dx = ||X1 −X2||, Dy = ||Y1 −Y2|| and Dxy
                                     
                          
                             
                  =  ||X1  −Y2||,  respectively.  Maa  et  al.  (1996)  advance  the  study  of  IPDs
                  substantially by proving that two distributions are equal, if and only if, the
                  interpoint distances within and between samples have the same univariate
                  distribution.  Hence,  instead  of  testing   ∶  =  ,  we  can  consider  an
                                                                     2
                                                           0
                                                                1
                  equivalent null hypothesis ′ ∶  =  =    , which holds if and only if
                                                         
                                                   
                                               0
                                                              
                   =  . Let µG1G1 and µG2G2 represent the expected value of within sample IPDS
                        2
                   1
                  , respectively and µG1G2 denote the expected value of the between samples IPD.
                  We    compute     the   average   IPD    of   X   and    Y   using     ()  =
                     2    ∑   −1 ∑    and     =  2  ∑   −1 ∑    , respectively.
                    (  −1)  =1  =+1   ()  ()    (  −1)  =1  =+1   ()
                  The average IPD between X and Y is    ()  =  1  ∑    ∑     () . We estimate
                                                                 =1  =1
                  µG1G1, µG2G2 and µG1G2 with    () ,    ()  and    () , respectively.
                     Suppose observations in X and Y samples are taken from distributions with
                  means µx, µy and covariances Σx, Σy, respectively. Using quadratic forms, it is not
                                        2
                                                     2
                  difficult to show that  = ∑     , is the standardized version of  , the -
                                        
                                              =1
                                                   
                                                                                     
                  th component of the vector X1, for r = 1,...,d and λ1,...,λd are the eigenvalues of
                                                                                2
                                   2
                  2 .  Hence, ( = ∑    = 2(  ) and  one  estimates ( ) with  twice
                     
                                  
                                        =1
                                                      
                                                                                
                                             
                                                                       2
                  the total sample variance. Similarly, one can show ( ) = ( +  ) + ||µx
                                                                       
                                                                                
                                                                                     
                      2
                  −µy|| . Using Jensen’s inequality, it follows that ( ) ≤ √2( ), and ( ) ≤
                                                                                         
                                                                              
                                                                    
                  √2( ).  Similarly,  one  can  show  that  ( ) ≤ √( +  ) + ||µ  − µ|| .
                                                                                           2
                                                                         
                                                             
                                                                             
                        
                  Consider the distance matrix D(, ) where all within and between sample IPDs
                  are listed. There are  = ( ) pairs of IPDs in X,  = ( ) pairs of IPDs in Y,
                                             
                                                                         
                                                                   
                                        
                                             2
                                                                         2
                  and mxy = nxny pairs of IPDS between samples. Let R = maxD(, ) − minD(, )
                  denote the range of all IPDs that appear below the main diagonal of D(, ).
                  We will divide the range into s evaluation points denoted by δ(t) for t = 1,...s.
                                                                     , and  ()  evaluated at
                  Denote the cumulative distributions of  ()  ,  ()
                  δ(t) by Hx(t) = ℙ( ()  ≤ δ(t)), Hy(t) = ℙ( () j ≤ δ(t)) and Hxy(t) = ℙ( ()  ≤ δ(t)),
                  respectively. Let (.) denote the indicator function and estimate the DFs by
                                                  −1   
                                              1
                                     ̂
                                      () =   ∑ ∑  ( ()  ≤ ()) ,
                                      
                                              
                                                 =1  =+1
                                                   −1   
                                              1
                                      ̂
                                      () =   ∑ ∑ ( ()  ≤ ()),
                                       
                                               
                                                  =1  =+1
                                                                     217 | I S I   W S C   2 0 1 9
   223   224   225   226   227   228   229   230   231   232   233