Page 26 - Special Topic Session (STS) - Volume 1
P. 26

STS346 A.H.M. Rahmatullah Imon


















                                          Figure 1. Outliers in data clusters

                  Spatial  outliers  are  those  observations  whose  characteristics  are  markedly
                  different from their spatial neighbors. The identification of spatial outliers is
                  important  because  it  can  reveal  hidden  but  valuable  knowledge  in  many
                  applications such as  identifying  aberrant  genes or  tumor cells,  discovering
                  highway  traffic  congestion  points,  locating  extreme  meteorological  events
                  such  as  tornadoes,  and  hurricanes  etc.  Although  outliers  could  be  easily
                  identified  in  univariate,  bivariate,  or  even  trivariate  data  through  graphical
                  examination of the data, visual inspection does not usually work for more than
                  three dimensions. Not only that automated identification of outliers is tricky
                  even for a two dimensional data if they data form clusters as shown in Figure
                  1. Here the idea of majority minority simply does not work, bad clusters are
                  identified as outliers [see Hadi et al. (2009)] based on classification techniques.
                  Things could even be cumbersome in regression models where outliers can
                  occur along the ydimension, or along the x-dimension, or both and/or among
                  the relationship between x and y. An excellent review of different aspects of
                  spatial outliers is available in Shekhar et al. [16] and Hadi and Imon (2018).
                  Conceptually, spatial outliers match with outliers in big data and for this reason
                  outlier  detection  techniques  designed  for  big  data  are  often  routinely
                  employed in spatial data. In big data the concept of outlier is local, not global
                  so as in spatial data. The distance and/or density based methods such as k–
                  nearest neighbourhood, local outlier factor (LOF), spatial outlier factor (SOF)
                  methods have become more popular. But all these methods are designed to
                  identify outliers along the y-axis and hence is not readily applicable for spatial
                  regression.  For  example,  temperatures  and  amount  of  rainfalls  of  different
                  regions may vary due to their distances from sea or mountain. Once we fit this
                  relationship by regression we may observe not only strange temperature or
                  rainfall pattern, the distance factor may also be unusual. Attempts have been
                  made to identify outliers based on residuals but it only focuses on the outliers
                  in y, but not in x or both and the whole concept is rather global than local. To
                  overcome this problem in this paper we propose a method which not only

                                                                      15 | I S I   W S C   2 0 1 9
   21   22   23   24   25   26   27   28   29   30   31