Page 224 - Contributed Paper Session (CPS) - Volume 7
P. 224
CPS2068 Jan-Philipp Kolb et al.
prediction. The resulting “forest”, is an ensemble of decision trees. Normally
the “bagging” method is used for building an ensemble of trees. Random
Forest adds additional randomness to the model while growing the trees. The
algorithm does not search globally for the optimal feature, but for the best
feature from a random subset of characteristics when the node is split. This
proceeding often leads to better models.
In this paper, we also apply gradient boosting machines (GBM) which grow
e.g. a sequence of trees that use updated residuals (J. Friedman et al. 2000).
Gradient boosting is often used since a single decision tree fails to include
predictive power from multiple, overlapping regions of the feature space. . In
boosting, an ensemble of classifiers is built incrementally. In each step, a new
sub‐model is added that tries to compensate for the errors made by the other
sub‐models applied previously.
We used the R programming language to implement the outlined
methods (R Development Core Team 2008). The R package caret was used to
train and test the models within a statistical learning environment (Kuhn et al.
2018). We used exhaustive grid search to tune the hyper‐parameters of the
various predictive models (Kuhn and others 2008).
3. Result
In a first step we have looked at the importance of the used variables. In
all statistical learning methods, the latency is important. For the lasso method
it is the most important variable. Other important variables are the cohort, the
number of complaints or the mode. The importance rank for the variables
varies across the different statistical learning techniques.
In Figure 1 the classifier performances are displayed. We have results for
four statistical learning techniques on the y‐axis and five performance
indicators in the five grids of the graphic. The number of correct predictions
divided by the number of all predictions is the accuracy. We have a rather high
accuracy for all techniques. However, since we have unbalanced data (more
respondents than non‐respondents), this indicator is by itself not particularly
meaningful.
In the second grid, Cohen’s Kappa is displayed as a performance measure
on the x‐axis. We used that measure because we have very imbalanced classes.
Cohens Kappa tells us how much better our classifier is performing over the
performance of a classifier that guesses at random according to the frequency
of each class. Cohen’s kappa is bounded by zero and one (although values
smaller than zero can occur). Values close to zero or below indicate that the
classifier is useless.
The receiver operating characteristic (ROC) is an excellent way to visualize
the performance of a classifier and to select a decision threshold (Bradley
1997). We create the ROC curve by plotting the sensitivity against the false
211 | I S I W S C 2 0 1 9