Page 86 - Contributed Paper Session (CPS) - Volume 2
P. 86

CPS1440 Avner Bar-Hen et al.




                                       Spatial CART classification trees
                                             1
                                                                               3
                                                           2
                               Avner Bar-Hen , Servane Gey , Jean-Michel Poggi
                                               1 CNAM, Paris, France
                                  2 Laboratoire MAP5, Univ. Paris Descartes, Paris, France
                              3 Laboratoire de Math_ematiques, Univ. Paris Sud, Orsay, France

                  Abstract
                  Based on links between partitions induced by CART classification trees and
                  marked  point  processes,  we  propose  a  spatial  variant  of  CART  method,
                  SpatCART,  focusing  on  the  two  populations  case.  While  usual  CART  tree
                  considers  marginal  distribution  of  the  response  variable  at  each  node,  we
                  propose  to  take  into  account  the  spatial  location  of  the  observations.  We
                  introduce  a  dissimilarity  index  based  on  Ripley's  intertype  K-function
                  quantifying the interaction between two populations. This index used for the
                  growing  step  of  the  CART  strategy,  leads  to  a  heterogeneity  function
                  consistent  with  the  original  CART  algorithm.  The  proposed  procedure  is
                  implemented,  illustrated  on  classical  examples  and  compared  to  direct
                  competitors. SpatCART is finally applied to a tropical forest example. This text
                  is an extended abstract of a full paper submitted for publication.

                  Keywords
                  Decision Tree; Classification; Point process; Spatial data

                  1.  Introduction
                      CART  (Classification  And  Regression  Trees)  is  a  statistical  method,  see
                  Breiman  et  al.  [3],  and  designing  tree  predictors  for  both  regression  and
                  classification.  We  restrict  our  attention  on  the  classification  case  with  two
                  populations.  Each  observation  is  characterized  by  some  input  variables
                  gathered in vector X and a binary label Y which is the response variable. The
                  general  principle  of  CART  is  to  recursively  partition  the  input  space  using
                  binary splits and then to determine an optimal partition for prediction. The
                  classical representation of the model relating Y to X is a tree representing the
                  underlying process of construction of the model. If the explanatory variables
                  are  spatial  coordinates,  we  get  a  spatial  decision  tree  and  this  induces  a
                  tessellation  of  the  space  (of  the  X  variables).  A  cell  of  this  tessellation
                  corresponds to a leaf of the decision tree. For a leaf of this tree, the response
                  variable Y is constant and corresponds to the majority class of the observations
                  belonging to this leaf.
                      The aim of this work is to adapt CART decision trees within the framework
                  of binary marked point processes by considering that the points inside the

                                                                      75 | I S I   W S C   2 0 1 9
   81   82   83   84   85   86   87   88   89   90   91