Page 225 - Contributed Paper Session (CPS) - Volume 7
P. 225

CPS2068 Jan-Philipp Kolb et al.
               positive rate (FPR) at various threshold settings. It is possible to calculate an
               indicator (ROC‐AUC) for that curve, and the values for this indicator are also
               bounded between zero and one. Values over .5 indicate that the used methods
               perform better than flipping a coin.
                   The sensitivity (Sens) also called the true positive rate (TPR) measures the
               proportion of actual positives that are correctly identified as such. As we have
               many more respondents than non‐respondents, the values for this sensitivity
               indicator are very close to one in our case.
                   Specificity  (Spec)  is  also  called  the  true  negative  rate.  It  measures  the
               percentage of real negatives that are correctly identified as such. Here the
               performance varies for the statistical learning techniques.
                   Sensitivity  and  specificity  are  inversely  proportional  to  each  other.  So
               when we increase specificity, sensitivity decreases, and vice versa.
























                         Figure 1 Boxplot with results for four methods and five metrics

               4.  Discussion and Conclusion
                   The present study investigates the potential of using features from diverse
               sources for predicting nonresponse in the GESIS Panel. We investigated which
               predictors do play a role for the risk of nonresponse. Also, different statistical
               learning  techniques  are  employed  and  compared  regarding  prediction
               accuracy. We can use the results of this approach as a guideline for developing
               a useful model for predicting panel nonresponse in advance.
                   More specifically, preliminary findings suggest that variable importance
               varies across different statistical learning techniques. Para-data like the latency
               is essential for the prediction of panel nonresponse. Random forests exhibit
               the best results regarding precision and recall.
                   In further analysis, we plan to take the longitudinal character of the GESIS
               panel  more  into  account.  So  far  we  divided  one  panel  wave  into  test  and


                                                                  212 | I S I   W S C   2 0 1 9
   220   221   222   223   224   225   226   227   228   229   230