Page 400 - Special Topic Session (STS) - Volume 4
P. 400

STS2320 Ali S. H.

                              ( ,  ̅  ,  ) = √( −  ̅ )  ( −  ̅), for i=1,2,….,n,
                                                         −1
                              
                                                      
                                     
                                 
                                                 
                                        
                                                              
                                                          

                      Several methods exist for obtaining x r and S r. Two of the most common
                  ways  are  the  Minimum  Covariance  Determinant  (MCD)  proposed,  e.g.,  in
                  Rousseuw  and  Van  Driessen  (1999),  and  the  Blocked  Adaptive,
                  Computationally-Efficient outlier Numerator (BACON), proposed by Billor et
                  al.  (2000).  These  two  methods  are  implemented  in  R  using  the  functions
                  “CovMcd” in the package “rrcov” and BACON in the Package “robustX”.
                      Consider for example two variables X and Y. Figure 3(a) shows a boxplot
                  for each of the two variables. The boxplot rule does not declare any outliers.
                  The scatter plot of Y versus X, shown in Figure 3(b) shows clearly that there are
                  four outliers in the data. These four observations are bivariate outliers. Usually,
                  outliers in high dimensional space are not easily detected by examining lower
                  dimensional spaces. When applying multivariate outlier detection methods,
                  like the three mentioned above, the outliers clearly stand out in the Index Plot
                  of  the  distances  as  shown  in  Figure  4,  where  the  Index  Plots  of  the
                  Mahalanobis distances and the robust distances obtained by using the BACON
                  and  the  MCD  methods  are  displayed.  Even  the  non-robust  Mahalanobis
                  distances is able to identify the four observations as outliers.

                           (a) Box Plots of X and Y           (b) Scatter Plot of Y versus X






















                     Figure 3. (a) Boxplots of two variables, X and Y and (b) their scatter plot

                  4.  Discussion and Conclusion
                      Composite indices data often need editing before the computation of the
                  indices. Highly skewed variables and variables with fat tails may need some
                  transformation to achieve symmetry or normality. In addition, univariate and
                  multivariate outliers in the data need to be identified and dealt with before



                                                                     389 | I S I   W S C   2 0 1 9
   395   396   397   398   399   400   401   402   403   404   405