Page 228 - Contributed Paper Session (CPS) - Volume 2
P. 228
CPS1844 Reza M.
is given by () = || − ||. We will call () and () the within and
() the between sample IPDs.
2. Comparing Two Groups
Let , and denote the DFs of Dx = ||X1 −X2||, Dy = ||Y1 −Y2|| and Dxy
= ||X1 −Y2||, respectively. Maa et al. (1996) advance the study of IPDs
substantially by proving that two distributions are equal, if and only if, the
interpoint distances within and between samples have the same univariate
distribution. Hence, instead of testing ∶ = , we can consider an
2
0
1
equivalent null hypothesis ′ ∶ = = , which holds if and only if
0
= . Let µG1G1 and µG2G2 represent the expected value of within sample IPDS
2
1
, respectively and µG1G2 denote the expected value of the between samples IPD.
We compute the average IPD of X and Y using () =
2 ∑ −1 ∑ and = 2 ∑ −1 ∑ , respectively.
( −1) =1 =+1 () () ( −1) =1 =+1 ()
The average IPD between X and Y is () = 1 ∑ ∑ () . We estimate
=1 =1
µG1G1, µG2G2 and µG1G2 with () , () and () , respectively.
Suppose observations in X and Y samples are taken from distributions with
means µx, µy and covariances Σx, Σy, respectively. Using quadratic forms, it is not
2
2
difficult to show that = ∑ , is the standardized version of , the -
=1
th component of the vector X1, for r = 1,...,d and λ1,...,λd are the eigenvalues of
2
2
2 . Hence, ( = ∑ = 2( ) and one estimates ( ) with twice
=1
2
the total sample variance. Similarly, one can show ( ) = ( + ) + ||µx
2
−µy|| . Using Jensen’s inequality, it follows that ( ) ≤ √2( ), and ( ) ≤
√2( ). Similarly, one can show that ( ) ≤ √( + ) + ||µ − µ|| .
2
Consider the distance matrix D(, ) where all within and between sample IPDs
are listed. There are = ( ) pairs of IPDs in X, = ( ) pairs of IPDs in Y,
2
2
and mxy = nxny pairs of IPDS between samples. Let R = maxD(, ) − minD(, )
denote the range of all IPDs that appear below the main diagonal of D(, ).
We will divide the range into s evaluation points denoted by δ(t) for t = 1,...s.
, and () evaluated at
Denote the cumulative distributions of () , ()
δ(t) by Hx(t) = ℙ( () ≤ δ(t)), Hy(t) = ℙ( () j ≤ δ(t)) and Hxy(t) = ℙ( () ≤ δ(t)),
respectively. Let (.) denote the indicator function and estimate the DFs by
−1
1
̂
() = ∑ ∑ ( () ≤ ()) ,
=1 =+1
−1
1
̂
() = ∑ ∑ ( () ≤ ()),
=1 =+1
217 | I S I W S C 2 0 1 9