Page 90 - Contributed Paper Session (CPS) - Volume 2
P. 90

CPS1440 Avner Bar-Hen et al.
                                                         
                                      
                                     ̂  ̂ 
                                                        ̂  ̂ 
                                              
                                                         
                                       
                          
                         ̂
                                            ̂
                                                              ̂
                                              
                                                                 
                           () =         () +           ()
                          
                                       ̂  ̂ 
                                                         ̂  ̂ 
                                                
                                                           
                                        
                                            1
                                       +        [ ∑ ∑           + ∑ ∑            ]
                                          ̂  ̂  
                                                    {    <}   {    <}
                                                             
                                                                                
                                            
                                                                   
                  where the notations are as follows:
                                 the area of node t
                               
                              ̂ 
                                 ,  the estimation of the density of mark ∗ in node (∗
                               ∗
                                   =  , )
                                the  euclidean  distance  between   -marked
                               
                                  individual   and -marked individual 
                                                                       
                                             
                  This leads to the natural definition of ∆Iij(s,t,r) as the variation of heterogeneity
                  coming from the split of node t using s, at radius r:
                                                                   
                                                                 
                                            
                                          
                                          ̂  ̂             ̂  ̂ 
                                                                  
                                           
                               
                              ̂
                                          
                                              ̂
                                                                      ̂
                  ∆ (s,t,r) :=   () −   ̂  ̂     () − (1 −  )        ()              (5)
                                                                        
                                                 
                               
                    
                                                             
                                                                       
                                               
                                                                  
                                           
                                                                   
                                                              ̂  ̂ 
                               
                               
                  where  =    . The area factor αs is natural when dealing with spatial data
                          
                  since it leads to reweight properly the impurity of the two nodes tR and tL.
                      We propose an algorithm using impurity loss ∆ defined by (5) to develop
                                                                   
                  the  maximal  tree  Tmax.  The  estimator  K ij  of  the  intertype  function  Kij  is
                                                           ˆ
                  computed at each node t, the value of r is fixed as the one for which the
                  estimated  intertype  K-function  is  the  farthest  from  the  one  of  random
                  labelling.
                      A  maximal  tree  Tmax  is  computed  via  recursive  partitioning,  and  then
                  pruned with the penalized criterion based on misclassification rate defined in
                  (2), or based on Gini index. This produces a decreasing sequence of K subtrees
                  pruned each one from another, denoted by ≻ ⋯ ≻  = { }, and associated
                                                                            1
                                                                      
                                                            1
                  with an increasing sequence of K complexities, denoted by 0 = α1 < ... < αK.
                      Rephrasing the algorithm in spatial terms, we could say that starting from
                  the whole original region, for which we compute r0, which can be considered
                  as the characteristic scale at this resolution. Then before splitting, we consider
                  for r = r0 the quantity ∆Iij(s,t,r) and we seek to the best split. After splitting we
                  seek to the best r ≤ r0 maximizing ∆Iij(s,t,rˆ ). Then after splitting, the two child
                  are considered in parallel in the same way, recomputing r0,L and r0,R. It turns
                  out that ∆Iij(s,t,r) is decreasing along any branch of the tree. So the reordered
                  sequence of ri can be used to define a sequence of nested subtrees.
                      We  use  two  R  packages  to  implement  the  SpatCART  algorithm and to
                  display the results: spatstat to deal with point processes, and in particular to
                  compute ∆Iij in the construction of the maximal tree and tree to deal with tree
                  structures.
                                                                      79 | I S I   W S C   2 0 1 9
   85   86   87   88   89   90   91   92   93   94   95