Page 170 - Contributed Paper Session (CPS) - Volume 7
P. 170
CPS2055 Asanao S. et al.
size tree is constructed. Various authors have proposed several criterions, and
essentially these criterions divided into two types. One is the minimization of
the risk within the node, and the other is the maximization of the degree of
separation between nodes. For example, Log-rank test statistics is widely used
(Leblanc and Crowley (1993)). The maximum size tree obtained by the splitting
step suffers from an overfitting problem. To handle this problem, a set of
nested subtrees is produced from the maximum size tree in the pruning step.
In the selection step, the optimal size tree is selected by cross-validation or
bootstrap method.
In this study, we consider the concordance probability-based splitting
criterions for constructing a survival tree. The area under the curve of the
receiver operating characteristic curves is widely used to evaluate the
prediction accuracy of the model for binary outcome, and it is relevant to
Kendall's tau and Mann-Whitney U test statistics. In survival data case, this
idea is inherited by concordance probability and it is used to evaluate the
prediction accuracy of the model. We use the four measures which evaluate
the concordance probabilities as the splitting criterions: Harrell’s C (Harrell et
al(1996)), Uno’s approach (Uno et al. (2011)), Begg’s approach (Begg et al.
(2000)), and Korn and Simon’s approach (Korn and Simon (1990)).
In the Schmid et al. (2016), it has been proposed that Harrell’s C is used as
the splitting criterion to construct a random forest. In their research, maximum
size trees are constructed using Harrell’s C from bootstrap samples, and then
the trees are aggregated to construct a forest. In this research, we propose the
pruning and selection methods to construct a tree model based on the
concordant measures. We study the performance of the splitting abilities of
the criterions based on these measures, and compare the survival trees
constructed by these criterions and conventional criterions through
simulations.
The remainder of this paper is organized as follows. In Section 2, we
introduce the method to construct a survival tree based on the measures for
concordance probabilities. In Section 3, the results of the simulation studies
are described. Finally, in Section 4, we present the conclusions of this paper.
2. Methodology
1. Concordance probability
Let and be the true failure and censoring time for subject ,
respectively. Then, we can observe the time = min( , ) . Let =
( = ) be the event indicator for , which is 1 if the observation
experience an event and 0 if the observation is censored. Let =
( , ⋯ , ) denote dimensional covariate vector for . Then, an observed
1
sample is represented by ℒ = {( , , ); = 1, ⋯ , }.
157 | I S I W S C 2 0 1 9