Page 92 - Contributed Paper Session (CPS) - Volume 2
P. 92
CPS1440 Avner Bar-Hen et al.
It should be noted that the model selection method used in this paper has
the advantage of selecting a tree on the basis of all points, unlike many
conventional methods requiring splitting the sample into one or more sub-
samples. Given the spatial nature of the data and the splitting criterion,
removing points would introduce too much bias into the estimate. However
the structural instability with respect to the set observations is obvious and
even more critical in the spatial situation. A key point in our approach is the
choice of r and a simulation study would be useful to determine if different
values of r lead to different results. Additionally, some other classical point
process topics can be addressed such as the type or the intensity of the point
process.
Instability is one of the main drawbacks of CART and many classical ways
to contain it are avail- able. Ensemble methods like bagging, boosting and
random forests (see [6] for example) can then be similarly defined in the spatial
case but the aggregation part should be defined. To introduce and experiment
a bagging-like scheme could be of interest for a future work.
This paper is strongly dependent on the binary case, some extensions to
handle the multiclass case can be sketched. One possibility is to adopt the
strategy used to handle multiclass Support Vector Machines (SVMs) which are
intrinsically two-class classifiers (see [6]). A technique widely used in practice
is to build one-versus-rest classifiers. Then we could obtain several
tessellations and select the final partition by taking into account some criterion
maximizing some global measure of heterogeneity between cells.
Another possible extension is to modify, not only the growing step of the
CART algorithm, but also the pruning strategy in order to drive it by point
processes properties. This requires defining some additive measure of
heterogeneity of the partitions. An intermediate solution could be to use a
classical pruning step to simplify and avoid spurious useless splits but the final
choice could be to select the final tree among the nested sequence of tree by
maximizing the same kind of heterogeneity criterion mentioned before.
References
1. A Bar-Hen and N Picard. Simulation study of dissimilarity between point
process Computational Statistics, 21(3-4):487-507, 2006
2. L Bel, D Allard, JM Laurent, R Cheddadi and A Bar-Hen. CART algorithm
for spatial data: application to environmental and ecological data,
Computat. Stat. and Data Anal., 53(8):3082-3093, 2009
3. L Breiman, JH Friedman, RA Olshen and CJ Stone. Classification and
regression trees. Chapman & Hall, 1984
4. N Cressie. Statistics for Spatial Data, John Wiley & Sons, New York, 1991
81 | I S I W S C 2 0 1 9