Page 252 - Contributed Paper Session (CPS) - Volume 8
P. 252
CPS2274 Nadiah M. et al.
characteristics of the general population (Kontaki et al., 2016). One of the most
widely used definitions of outlier is the one based on distance: an object x is
considered as an outlier, if there are less than k objects in a distance at most
R from x, excluding x itself. Otherwise, x is characterized as an inlier.
Kontaki et al., (2016) stated that the fundamental characteristic of the
majority of the proposed algorithms are operating in a static fashion. The
algorithm must be executed from scratch if there are changes in the
underlying data objects, leading to performance degradation when updates
are frequent. Kontaki et al., (2016) focuses on sliding window method that is
one of the various streaming techniques. Since the stream is continuously
updated with fresh data, it is impossible to maintain all of them in main
memory. Therefore, a window is used where it keeps track of the most recent
data and all mining tasks are performed based on what is “visible” through the
window. As reported in Gupta et al., (2013), most window-based models are
currently offline. The most relevant research works are Angiulli & Fassetti
(2007) and Yang, Rundensteiner, & Ward, (2009) where both considered the
problem of continuous outlier detection in window-based data streams,
without limiting their techniques to multi-dimensional data. However, both
methods still have some serious limitations.
3. Result
In this research, we use water quality data that provides information of
Dissolved Oxygen (DO) and Biochemical Oxygen Demand (BOD). Figure 1
shows the steps that are used to identify the outlier or inlier of DO and BOD.
We use Euclidean distance formula to find the distance for each point of the
data by using R Software to identify the outlier and inlier and the result may
vary depending on the value of members within the window (W), radius (R)
and number of neighbour (k). The value of k=3 and R=4 are used based on
(Kontaki et al., 2016) and the value W is set to be 10. Figure 2 shows an
example of 1-sliding window on a probabilistic data stream for window 1 and
2. Table 1 shows the result for each window. In window 1, point 4 is not a safe
inlier because it became an outlier in window 2. However, all inlier in window
1 is a safe inlier because it still remains inlier in window 2. For window 4, point
4 in Figure 3 is an outlier because it has three neighbours. However, in window
5 in Figure 4, point 4 is an inlier because it has five neighbours.
241 | I S I W S C 2 0 1 9