Page 395 - Special Topic Session (STS) - Volume 4
P. 395
STS2320 Ali S. H.
On the identification and handling of outliers in
composite index data
Ali S. Hadi
Department of Mathematics and Actuarial Science,
The American University in Cairo, Egypt. E-mail ahadi@aucegypt.edu
Abstract
Composite indices data often need editing before the computation of the
indices. Variables may need some transformation due to their high skewness
or kurtosis coefficients. Also, outliers are commonly found in composite index
data. These outliers can drastically affect the results of composite indices.
Identification of outliers improves data quality and reliability, hence it
improves the quality of the decisions drawn from the data and analysis. Three
important steps in constructing composite indices are (1) determining the
variables that need transformation, (2) identifying outliers when they exist in
index data, and (3) what to do with the outliers once they are identified? We
discuss these steps in constructing composite indices.
Keywords
BACON; Kurtosis; MCD; Mahalanobis distance; Min-Max Normalization;
Outliers; Robust, Skewness
1. Introduction
Numerous composite indices are computed on an annual basis. For
example, the Global Knowledge Index, the corruption perception index (CPI),
The Human Development Index (HDI), the Ibrahim Index of African
Governance (IIAG), the Gender Inequality Index (GII), and the Climate Change
Performance Index (CCPI) to mention only a few. Bandura (2008) provides a
survey of the current composite indices around the world. At that time
Bandura (2008) found 187 indices.
Most recently, the International Knowledge Index (IKI) was computed for
the first time and published in 2017 by the United Nations Development
Program (UNDP). The IKI extended the Arab Knowledge Index (AKI) which was
computed for the first time in 2015 by the Al Maktoum Foundation
(http://www.mbrf.ae/) to measure knowledge in the Arab countries.
Composite indices data are high-dimensional data, where the number of
variables sometimes exceeds the number of observations. A composite index
is a single number summary for each observation in the data. The quality of a
composite index cannot exceed the quality of the data that are used to
construct the index. Variables can be highly skewed and/or have severe
kurtosis. The data may also contain univariate and multivariate outliers. These
384 | I S I W S C 2 0 1 9