Page 223 - Contributed Paper Session (CPS) - Volume 7
P. 223

CPS2068 Jan-Philipp Kolb et al.
               the linearity between predictors and log odds. For the Logit model, the search
               routine for adequate predictors is external to the model. These techniques aim
               to determine whether a particular independent variable affects the dependent
               variable. If there is an effect, it is the target to estimate the magnitude of that
               effect. We have the following loss function that should be minimized (Pereira,
               Basto, and Silva 2016):
                                             
                                     () = ∑[   − log(1 +     )]
                                                  
                                            =1
                 is the binary outcome and β the vector of regression coefficients to be
                
               estimated. The outcome variable takes the value one if the panelist has not
               responded. It would take the value zero if the panellist responded.
                   When  we  use  all  the  information  available,  logit  regression  becomes
               unfeasible given the dimensionalty of the prediction problem . Least absolute
               shrinkage  and  selection  operator  (LASSO)  (Kyrillidis  and  Cevher  2012)  is  a
               feature selection method (T. Hastie, Tibshirani, and Friedman 2009). LASSO
               regression has inbuilt penalization functions to reduce overfitting. The loss
               function  that  should  be  minimized  (Pereira,  Basto,  and  Silva  2016)  is  then
               displayed in the nest formula:
                                                                  
                                 () = ∑[   − (1 +     )] −  ∑ |  |
                                                                        
                                
                                             
                                       =1                        =1
                 the i-throw of an matrix of n observations with p predictors. β is the
                
               column vector of the regression coefficients.   is the binary outcome and λ
                                                            
               is the shrinkage parameter.
                   We also applied tree-based methods where the built-in feature selection
               combines the predictor search algorithm with the parameter estimation. In
               these cases, the estimation is usually optimized with a target function like the
               likelihood. By using decision trees, we go from observations about an item
               represented in the branches to conclusions about the item’s target value
               which is represented in the leaves. The predictor space is recursively split into
               disjoint regions Rj
                                                    
                                         (; ) = ∑  ( ∈  )
                                                              
                                                       
                                                   =1

               with the tree parameters  = { ,  }. In the following the applied tree‐based
                                                 
                                              
               techniques are listed.
                   Conditional  inference  trees  (ctree)  use  significance  test  procedures  to
               select variables instead of selecting the variable that maximizes an information
               measure (Hothorn, Hornik, and Zeileis 2015).
                   The  random  forest  (rf)  technique  generates  and  combines  different
               decision trees (Breiman 2001). The goal is to get a more accurate and stable


                                                                  210 | I S I   W S C   2 0 1 9
   218   219   220   221   222   223   224   225   226   227   228