Page 250 - Contributed Paper Session (CPS) - Volume 8
P. 250

CPS2274 Nadiah M. et al.





                                    Outlier Detection in Official Statistics
                                             1,2
                                                                                   1
                            Nadiah Mohamed , Adzhar Rambli  , Ibrahim Mohamed
                                                               3
                  1  Institute of Mathematical Science, Faculty of Science, Universiti Malaya, 50603 Kuala Lumpur,
                                                    Malaysia
                    2  Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA Cawangan
                               Negeri Sembilan, 72000 Kuala Pilah Negeri Sembilan, Malaysia
                    3  Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA 40450 Shah
                                              Alam, Selangor, Malaysia

                  Abstract
                  Distance-based  outlier  detection  is  one  of  outlier  detection  techniques  for
                  deterministic data. Sliding window method is one of the various streaming
                  techniques to keep track of the most recent data where all mining tasks are
                  performed based on what is “visible” through the window. So far, the use of
                  outlier  detection  in  official  data  has  not  been  fully  utilized  yet,  though  its
                  potential in improving estimation and forecasting are highly useful. Besides,
                  the procedure is also able to identify abnormal pattern or trend in such large
                  data.  In  this  study,  we  intend  to  develop  new  outlier  detection  procedure
                  which can perform both purposes above on streaming case of official data, in
                  particular, the multidimensional and complex Malaysian economic data.

                  Keywords
                  Outlier detection; Official statistic; Sliding window; Water quality data

                  1.  Introduction
                      Outlier detection for temporal data can be divided into five types that are
                  time  series  data,  data  streams,  distributed  data,  spatiotemporal  data  and
                  network data (Gupta, Gao, & Aggarwal, 2013). This research focuses on water
                  stream data that classifies under data stream. There are some techniques that
                  can be used in order for us to detect outlier in data stream such as distance-
                  based  outlier  detection,  density-based  outlier  detection,  clustering  based
                  outlier detection, statistical based outlier detection, frequent pattern mining
                  based  outlier  detection,  classification-based  outlier  detection  and  angle-
                  based outlier detection (Souiden, Brahmi & Toumi, 2016).
                     Streaming  data  does  not  have  a  fixed  length  compared  to  static  data.
                  Streams  can  be  either  multidimensional  or  time-series.  Yamanishi,  K.,  &
                  Takeuchi, J. I. (2002) introduced SmartSifter as a program to compute online-
                  unsupervised outlier detection. Online discounting learning algorithm is used
                  in order to learn about the probabilistic mixture model. However, there is no
                  adjustment  made  for  incremental  updates  and  temporal  decay  for

                                                                     239 | I S I   W S C   2 0 1 9
   245   246   247   248   249   250   251   252   253   254   255