Page 223 - Contributed Paper Session (CPS) - Volume 7
P. 223
CPS2068 Jan-Philipp Kolb et al.
the linearity between predictors and log odds. For the Logit model, the search
routine for adequate predictors is external to the model. These techniques aim
to determine whether a particular independent variable affects the dependent
variable. If there is an effect, it is the target to estimate the magnitude of that
effect. We have the following loss function that should be minimized (Pereira,
Basto, and Silva 2016):
() = ∑[ − log(1 + )]
=1
is the binary outcome and β the vector of regression coefficients to be
estimated. The outcome variable takes the value one if the panelist has not
responded. It would take the value zero if the panellist responded.
When we use all the information available, logit regression becomes
unfeasible given the dimensionalty of the prediction problem . Least absolute
shrinkage and selection operator (LASSO) (Kyrillidis and Cevher 2012) is a
feature selection method (T. Hastie, Tibshirani, and Friedman 2009). LASSO
regression has inbuilt penalization functions to reduce overfitting. The loss
function that should be minimized (Pereira, Basto, and Silva 2016) is then
displayed in the nest formula:
() = ∑[ − (1 + )] − ∑ | |
=1 =1
the i-throw of an matrix of n observations with p predictors. β is the
column vector of the regression coefficients. is the binary outcome and λ
is the shrinkage parameter.
We also applied tree-based methods where the built-in feature selection
combines the predictor search algorithm with the parameter estimation. In
these cases, the estimation is usually optimized with a target function like the
likelihood. By using decision trees, we go from observations about an item
represented in the branches to conclusions about the item’s target value
which is represented in the leaves. The predictor space is recursively split into
disjoint regions Rj
(; ) = ∑ ( ∈ )
=1
with the tree parameters = { , }. In the following the applied tree‐based
techniques are listed.
Conditional inference trees (ctree) use significance test procedures to
select variables instead of selecting the variable that maximizes an information
measure (Hothorn, Hornik, and Zeileis 2015).
The random forest (rf) technique generates and combines different
decision trees (Breiman 2001). The goal is to get a more accurate and stable
210 | I S I W S C 2 0 1 9