Page 92 - Contributed Paper Session (CPS) - Volume 2
P. 92

CPS1440 Avner Bar-Hen et al.
                      It should be noted that the model selection method used in this paper has
                  the  advantage  of  selecting  a  tree  on  the  basis  of  all  points,  unlike  many
                  conventional methods requiring splitting the sample into one or more sub-
                  samples.  Given  the  spatial  nature  of  the  data  and  the  splitting  criterion,
                  removing points would introduce too much bias into the estimate. However
                  the structural instability with respect to the set observations is obvious and
                  even more critical in the spatial situation. A key point in our approach is the
                  choice of r and a simulation study would be useful to determine if different
                  values of r lead to different results. Additionally, some other classical point
                  process topics can be addressed such as the type or the intensity of the point
                  process.
                      Instability is one of the main drawbacks of CART and many classical ways
                  to contain it are avail- able. Ensemble methods like bagging, boosting and
                  random forests (see [6] for example) can then be similarly defined in the spatial
                  case but the aggregation part should be defined. To introduce and experiment
                  a bagging-like scheme could be of interest for a future work.
                      This paper is strongly dependent on the binary case, some extensions to
                  handle the multiclass case can be sketched. One possibility is to adopt the
                  strategy used to handle multiclass Support Vector Machines (SVMs) which are
                  intrinsically two-class classifiers (see [6]). A technique widely used in practice
                  is  to  build  one-versus-rest  classifiers.  Then  we  could  obtain  several
                  tessellations and select the final partition by taking into account some criterion
                  maximizing some global measure of heterogeneity between cells.
                      Another possible extension is to modify, not only the growing step of the
                  CART algorithm, but also the pruning strategy in order to drive it by point
                  processes  properties.  This  requires  defining  some  additive  measure  of
                  heterogeneity of the partitions. An intermediate solution could be to use a
                  classical pruning step to simplify and avoid spurious useless splits but the final
                  choice could be to select the final tree among the nested sequence of tree by
                  maximizing the same kind of heterogeneity criterion mentioned before.

                  References
                  1.  A Bar-Hen and N Picard. Simulation study of dissimilarity between point
                      process Computational Statistics, 21(3-4):487-507, 2006
                  2.  L Bel, D Allard, JM Laurent, R Cheddadi and A Bar-Hen. CART algorithm
                      for spatial data: application to environmental and ecological data,
                      Computat. Stat. and Data Anal., 53(8):3082-3093, 2009
                  3.  L Breiman, JH Friedman, RA Olshen and CJ Stone. Classification and
                      regression trees. Chapman & Hall, 1984
                  4.  N Cressie. Statistics for Spatial Data, John Wiley & Sons, New York, 1991




                                                                      81 | I S I   W S C   2 0 1 9
   87   88   89   90   91   92   93   94   95   96   97