Page 173 - Contributed Paper Session (CPS) - Volume 7
P. 173

CPS2055 Asanao S. et al.
               used splitting rule consists of a single covariate   (  =  1, ⋯ , ). If   is a
                                                                                     
                                                                 
               quantitative variable, then the rule becomes ``  ≤  ?'', where  is a threshold.
                                                            
               If   is a categorical variable with the set of possible values ℱ , then the rule
                                                                           
                  
               becomes ``  ∈ ℱ   ?'', where     ⊂ ℱ .
                                                      
                           
                   The CART algorithm for constructing the tree-structured model comprises
               splitting,  pruning,  and  selection.  In  the  splitting  step,  covariates  space  are
               recursively divided based on the optimal splitting rules and the maximum-size
               tree   is constructed. To determine the optimal splitting rule of a node  into
                     0
                 and  , we evaluate all the possible splitting rules for . In order to build the
                
                       
               model with measures for concordance probability, we assume the following
               to  dichotomize  the  node  :    has  higher  risk  than   ,  we  evaluate  the
                                             
                                                                      
               concordance  probabilities  from    =  0 to   =  max{  ;   =  1,   ∈  },  and
                                                                        
                                                                    
               the contribution of the pair (,), where   =   to the estimate of  is 0.5.
                                                              
                                                        
               Under  these  assumptions,  the  splitting  criterions  based  on  the  measures
               ̂   ̂ ̂    ̂ 
                 ,   ,    and    are given by as follows:
                    
               (i) The criterion based on Harrell’s C
                      ∑    ∑     ( <  ) + 0.5{∑   ( <  ) + ∑   ( <  )}
                                                                  
                                                             
                                     
                                          
                                  
                                                         
                                                                                      
                                                                              
                                                                                 
                 ̂
                 =   ∈   ∈       ∑    ,∈          ,∈ 
                  
                                                     ( <  )
                                                     
                                                 ,∈
               (ii) The criterion based on Uno’s approach     
                                 −2                      −2                    −2
                             
                                                     
                                                                           
                    ∑ ∈   ∑ ∈  {  ̂ (   )}   (  <   ) + 0.5{∑ ,∈   {  ̂ (  )}   (  <   ) + ∑ ,∈  {  ̂ (  )}   (  <   )}
                 ̂ =                                −2
                 
                                             ∑ ,∈ {  ̂ (  )}   (  <   )
                                                   
                      ̂
               where  (. )is the Kaplan-Meier estimator for the censoring distribution based
                       
               on the samples included in .
               (iii) The criterion based on Begg’s approach
                                  2                   ( − 1) +  (  − 1)
                         ̂
                         =   ( − 1) {∑ ∑  +          4        },
                         
                                                 
                                  
                               
                                         ∈   ∈ 
               where   is the number of samples included in the node .   is defined as
                                                                           
                       
               follows: If   =   =  1, then
                                
                          
                                                 0,     >  
                                                        
                                            = { 1,    <   .
                                            
                                                        
               If   = 0,  = 1, then
                         
                  

               If   = 1,   = 0, then
                         
                  

               If   =  = 0, then
                  
                       
                                                                  160 | I S I   W S C   2 0 1 9
   168   169   170   171   172   173   174   175   176   177   178