Page 224 - Contributed Paper Session (CPS) - Volume 7
P. 224

CPS2068 Jan-Philipp Kolb et al.
               prediction. The resulting “forest”, is an ensemble of decision trees. Normally
               the  “bagging”  method  is  used  for  building  an ensemble  of  trees.  Random
               Forest adds additional randomness to the model while growing the trees. The
               algorithm does not search globally for the optimal feature, but for the best
               feature from a random subset of characteristics when the node is split. This
               proceeding often leads to better models.
                   In this paper, we also apply gradient boosting machines (GBM) which grow
               e.g. a sequence of trees that use updated residuals (J. Friedman et al. 2000).
               Gradient boosting is often used since a single decision tree fails to include
               predictive power from multiple, overlapping regions of the feature space. . In
               boosting, an ensemble of classifiers is built incrementally. In each step, a new
               sub‐model is added that tries to compensate for the errors made by the other
               sub‐models applied previously.
                   We  used  the  R  programming  language  to  implement  the  outlined
               methods (R Development Core Team 2008). The R package caret was used to
               train and test the models within a statistical learning environment (Kuhn et al.
               2018). We used exhaustive grid search to tune the hyper‐parameters of the
               various predictive models (Kuhn and others 2008).

               3.  Result
                   In a first step we have looked at the importance of the used variables. In
               all statistical learning methods, the latency is important. For the lasso method
               it is the most important variable. Other important variables are the cohort, the
               number of complaints or the mode. The importance rank for the variables
               varies across the different statistical learning techniques.
                   In Figure 1 the classifier performances are displayed. We have results for
               four  statistical  learning  techniques  on  the  y‐axis  and  five  performance
               indicators in the five grids of the graphic. The number of correct predictions
               divided by the number of all predictions is the accuracy. We have a rather high
               accuracy for all techniques. However, since we have unbalanced data (more
               respondents than non‐respondents), this indicator is by itself not particularly
               meaningful.
                   In the second grid, Cohen’s Kappa is displayed as a performance measure
               on the x‐axis. We used that measure because we have very imbalanced classes.
               Cohens Kappa tells us how much better our classifier is performing over the
               performance of a classifier that guesses at random according to the frequency
               of each class. Cohen’s kappa is bounded by zero and one (although values
               smaller than zero can occur). Values close to zero or below indicate that the
               classifier is useless.
                   The receiver operating characteristic (ROC) is an excellent way to visualize
               the  performance  of  a  classifier  and  to  select  a  decision  threshold  (Bradley
               1997). We create the ROC curve by plotting the sensitivity against the false

                                                                  211 | I S I   W S C   2 0 1 9
   219   220   221   222   223   224   225   226   227   228   229