Page 251 - Contributed Paper Session (CPS) - Volume 8
P. 251
CPS2274 Nadiah M. et al.
conventional multidimensional data. Angiulli & Fassetti (2007) used Indexed
Stream Buffer (ISB) that support a range query. This exact algorithm of new
data structure can compute distance outliers efficiently. Besides that, they also
introduced approximate algorithm where they maintain a fraction of safe
inliers in ISB and include the fraction of preceding neighbours which is also
identified as safe inliers to the total number of safe inliers. Yang,
Rundensteiner, & Ward, (2009) propose an efficient algorithm that uses
predicted views to calculate distance-based outliers. This “predicted views”
can skip the step of maintaining all neighbour relationships across time and
maintaining cluster of abstracted neighbour relationships that are expensive.
Yang et al., (2009) used dynamic cluster maintenance to the problem of
distance-based outlier detection for stream data.
Areas such as electronic commerce, credit card fraud, and even the analysis
of performance statistics of professional athletes can lead us to the discovery
of unexpected knowledge when dealing with finding outliers (exceptions) in
large, multidimensional datasets. The notion of DB- (Distance-Based) outliers
and development of cell-based algorithms for computing such outliers by
Knorr & Ng (1998) is the best for k ≤ 4, where k is value of dimensional
datasets. Efficient Nested-Loop version (ENL) and its Parallel Nested-Loop
version (PENL) introduced by Hung & Cheung (1999) shows great results that
it is a very good choice to mine outliers in a cluster of workstations with a low-
cost interconnected by a commodity communication network. Besides that,
Ramaswamy et al. (2000) also developed a highly efficient partition-based
algorithm in order to determine very quickly significant number of the input
points that cannot be outlier. Micro-cluster-based local outlier mining
algorithm introduced by Jin, W., Tung, A. K., & Han, J. (2001) compresses the
data and used cut-plane solution for overlapping data.
2. Methodology
Distance-based outlier detection is one of outlier detection techniques on
deterministic data. Distance-based outlier detection considers a point as an
outlier of a dataset if the number of points within a certain distance from it is
below a given threshold (Wang, Yang, Wang, & Yu, 2010). A new definition of
distance-based outlier on uncertain data stream given by Wang, Yang, Wang,
& Yu, (2010) maintains the basic idea of the traditional definition and employ
probability. A dynamic programming algorithm (DPA) is proposed where it can
process each single element in linear time, avoiding expensively unfolding the
possible worlds of its neighbourhoods. A pruning-based approach (PBA) is
also proposed by Wang, Yang, Wang, & Yu, (2010) to effectively and efficiently
reduce the processing elements in the sliding window and save detection cost.
Outlier detection in big data set up is a data-mining task focusing on the
discovery of objects, called outliers that do not seem to have the
240 | I S I W S C 2 0 1 9