Page 90 - Contributed Paper Session (CPS)

Page 90 - Contributed Paper Session (CPS) - Volume 2

P. 90

CPS1440 Avner Bar-Hen et al.

̂ ̂
̂ ̂

̂
̂
̂

() = () + ()

̂ ̂
̂ ̂

1
+ [ ∑ ∑ + ∑ ∑ ]
̂ ̂
{ <} { <}

where the notations are as follows:
the area of node t

̂
, the estimation of the density of mark ∗ in node (∗
∗
= , )
the euclidean distance between -marked

individual and -marked individual

This leads to the natural definition of ∆Iij(s,t,r) as the variation of heterogeneity
coming from the split of node t using s, at radius r:

̂ ̂ ̂ ̂

̂
̂
∆ (s,t,r) := () − ̂ ̂ () − (1 − ) () (5)

̂ ̂

where = . The area factor αs is natural when dealing with spatial data

since it leads to reweight properly the impurity of the two nodes tR and tL.
We propose an algorithm using impurity loss ∆ defined by (5) to develop

the maximal tree Tmax. The estimator K ij of the intertype function Kij is
ˆ
computed at each node t, the value of r is fixed as the one for which the
estimated intertype K-function is the farthest from the one of random
labelling.
A maximal tree Tmax is computed via recursive partitioning, and then
pruned with the penalized criterion based on misclassification rate defined in
(2), or based on Gini index. This produces a decreasing sequence of K subtrees
pruned each one from another, denoted by ≻ ⋯ ≻ = { }, and associated
1

1
with an increasing sequence of K complexities, denoted by 0 = α1 < ... < αK.
Rephrasing the algorithm in spatial terms, we could say that starting from
the whole original region, for which we compute r0, which can be considered
as the characteristic scale at this resolution. Then before splitting, we consider
for r = r0 the quantity ∆Iij(s,t,r) and we seek to the best split. After splitting we
seek to the best r ≤ r0 maximizing ∆Iij(s,t,rˆ ). Then after splitting, the two child
are considered in parallel in the same way, recomputing r0,L and r0,R. It turns
out that ∆Iij(s,t,r) is decreasing along any branch of the tree. So the reordered
sequence of ri can be used to define a sequence of nested subtrees.
We use two R packages to implement the SpatCART algorithm and to
display the results: spatstat to deal with point processes, and in particular to
compute ∆Iij in the construction of the maximal tree and tree to deal with tree
structures.
79 | I S I W S C 2 0 1 9

85 86 87 88 89 90 91 92 93 94 95