Page 250 - Contributed Paper Session (CPS) - Volume 8
P. 250
CPS2274 Nadiah M. et al.
Outlier Detection in Official Statistics
1,2
1
Nadiah Mohamed , Adzhar Rambli , Ibrahim Mohamed
3
1 Institute of Mathematical Science, Faculty of Science, Universiti Malaya, 50603 Kuala Lumpur,
Malaysia
2 Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA Cawangan
Negeri Sembilan, 72000 Kuala Pilah Negeri Sembilan, Malaysia
3 Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA 40450 Shah
Alam, Selangor, Malaysia
Abstract
Distance-based outlier detection is one of outlier detection techniques for
deterministic data. Sliding window method is one of the various streaming
techniques to keep track of the most recent data where all mining tasks are
performed based on what is “visible” through the window. So far, the use of
outlier detection in official data has not been fully utilized yet, though its
potential in improving estimation and forecasting are highly useful. Besides,
the procedure is also able to identify abnormal pattern or trend in such large
data. In this study, we intend to develop new outlier detection procedure
which can perform both purposes above on streaming case of official data, in
particular, the multidimensional and complex Malaysian economic data.
Keywords
Outlier detection; Official statistic; Sliding window; Water quality data
1. Introduction
Outlier detection for temporal data can be divided into five types that are
time series data, data streams, distributed data, spatiotemporal data and
network data (Gupta, Gao, & Aggarwal, 2013). This research focuses on water
stream data that classifies under data stream. There are some techniques that
can be used in order for us to detect outlier in data stream such as distance-
based outlier detection, density-based outlier detection, clustering based
outlier detection, statistical based outlier detection, frequent pattern mining
based outlier detection, classification-based outlier detection and angle-
based outlier detection (Souiden, Brahmi & Toumi, 2016).
Streaming data does not have a fixed length compared to static data.
Streams can be either multidimensional or time-series. Yamanishi, K., &
Takeuchi, J. I. (2002) introduced SmartSifter as a program to compute online-
unsupervised outlier detection. Online discounting learning algorithm is used
in order to learn about the probabilistic mixture model. However, there is no
adjustment made for incremental updates and temporal decay for
239 | I S I W S C 2 0 1 9