Page 86 - Contributed Paper Session (CPS) - Volume 2
P. 86
CPS1440 Avner Bar-Hen et al.
Spatial CART classification trees
1
3
2
Avner Bar-Hen , Servane Gey , Jean-Michel Poggi
1 CNAM, Paris, France
2 Laboratoire MAP5, Univ. Paris Descartes, Paris, France
3 Laboratoire de Math_ematiques, Univ. Paris Sud, Orsay, France
Abstract
Based on links between partitions induced by CART classification trees and
marked point processes, we propose a spatial variant of CART method,
SpatCART, focusing on the two populations case. While usual CART tree
considers marginal distribution of the response variable at each node, we
propose to take into account the spatial location of the observations. We
introduce a dissimilarity index based on Ripley's intertype K-function
quantifying the interaction between two populations. This index used for the
growing step of the CART strategy, leads to a heterogeneity function
consistent with the original CART algorithm. The proposed procedure is
implemented, illustrated on classical examples and compared to direct
competitors. SpatCART is finally applied to a tropical forest example. This text
is an extended abstract of a full paper submitted for publication.
Keywords
Decision Tree; Classification; Point process; Spatial data
1. Introduction
CART (Classification And Regression Trees) is a statistical method, see
Breiman et al. [3], and designing tree predictors for both regression and
classification. We restrict our attention on the classification case with two
populations. Each observation is characterized by some input variables
gathered in vector X and a binary label Y which is the response variable. The
general principle of CART is to recursively partition the input space using
binary splits and then to determine an optimal partition for prediction. The
classical representation of the model relating Y to X is a tree representing the
underlying process of construction of the model. If the explanatory variables
are spatial coordinates, we get a spatial decision tree and this induces a
tessellation of the space (of the X variables). A cell of this tessellation
corresponds to a leaf of the decision tree. For a leaf of this tree, the response
variable Y is constant and corresponds to the majority class of the observations
belonging to this leaf.
The aim of this work is to adapt CART decision trees within the framework
of binary marked point processes by considering that the points inside the
75 | I S I W S C 2 0 1 9