Page 251 - Contributed Paper Session (CPS) - Volume 8
P. 251

CPS2274 Nadiah M. et al.
            conventional multidimensional data. Angiulli & Fassetti (2007) used Indexed
            Stream Buffer (ISB) that support a range query. This exact algorithm of new
            data structure can compute distance outliers efficiently. Besides that, they also
            introduced  approximate  algorithm  where  they  maintain  a  fraction  of  safe
            inliers in ISB and include the fraction of preceding neighbours which is also
            identified  as  safe  inliers  to  the  total  number  of  safe  inliers.  Yang,
            Rundensteiner,  &  Ward,  (2009)  propose  an  efficient  algorithm  that  uses
            predicted views to calculate distance-based outliers. This “predicted views”
            can skip the step of maintaining all neighbour relationships across time and
            maintaining cluster of abstracted neighbour relationships that are expensive.
            Yang  et  al.,  (2009)  used  dynamic  cluster  maintenance  to  the  problem  of
            distance-based outlier detection for stream data.
                Areas such as electronic commerce, credit card fraud, and even the analysis
            of performance statistics of professional athletes can lead us to the discovery
            of unexpected knowledge when dealing with finding outliers (exceptions) in
            large, multidimensional datasets. The notion of DB- (Distance-Based) outliers
            and  development  of  cell-based  algorithms  for  computing  such  outliers  by
            Knorr  &  Ng  (1998)  is  the  best  for  k  ≤  4,  where  k  is  value  of  dimensional
            datasets.  Efficient  Nested-Loop  version  (ENL)  and  its  Parallel  Nested-Loop
            version (PENL) introduced by Hung & Cheung (1999) shows great results that
            it is a very good choice to mine outliers in a cluster of workstations with a low-
            cost interconnected by a commodity communication network. Besides that,
            Ramaswamy  et  al.  (2000)  also  developed  a  highly  efficient  partition-based
            algorithm in order to determine very quickly significant number of the input
            points  that  cannot  be  outlier.  Micro-cluster-based  local  outlier  mining
            algorithm introduced by Jin, W., Tung, A. K., & Han, J. (2001) compresses the
            data and used cut-plane solution for overlapping data.

            2.  Methodology
                Distance-based outlier detection is one of outlier detection techniques on
            deterministic data. Distance-based outlier detection considers a point as an
            outlier of a dataset if the number of points within a certain distance from it is
            below a given threshold (Wang, Yang, Wang, & Yu, 2010).  A new definition of
            distance-based outlier on uncertain data stream given by Wang, Yang, Wang,
            & Yu, (2010) maintains the basic idea of the traditional definition and employ
            probability. A dynamic programming algorithm (DPA) is proposed where it can
            process each single element in linear time, avoiding expensively unfolding the
            possible worlds of its neighbourhoods. A pruning-based approach (PBA) is
            also proposed by Wang, Yang, Wang, & Yu, (2010) to effectively and efficiently
            reduce the processing elements in the sliding window and save detection cost.
            Outlier detection in big data set up is a  data-mining task  focusing on the
            discovery  of  objects,  called  outliers  that  do  not  seem  to  have  the

                                                               240 | I S I   W S C   2 0 1 9
   246   247   248   249   250   251   252   253   254   255   256