Page 360 - Contributed Paper Session (CPS) - Volume 6
P. 360
CPS1966 Jessa L. S. C. et al.
shows how to compute for the log-likelihood that class q’ will be chosen over
Unit Link given the set of predictor variables x. (Hosmer and Lemeshow, 2000)
Fitting the model is equivalent to estimating the value of the vector B in
the equation above such that the deviations of predicted values from the
observed values are minimized. This was done by applying the method of
maximum likelihood which is applied in R using the multinom function. To
check how the results align with CART, MLR was also run to consider only the
final predictors from CART. The predictors were transformed to factors. To
assess the significance of a predictor the Wald’s 2-Tailed Z-Test was
conducted in R. The goodness-of-fit test was implemented using the Hosmer-
Lemeshow Goodness-of-fit test through the logitgof function in R (Jay, 2017).
As an ensemble, Random Forest is a method that involves a collection of
Classification Trees. Constructing the ensemble is an application of the
bootstrap aggregation or bagging method. The first step is the repeated
sampling in the observations via bootstrap such that the samples that will be
generated will be used to construct the Classification Trees. To generate the
overall prediction of the ensemble on the classification of an observation, a
voting scheme is implemented to aggregate the classification outcomes of the
Classification Trees. Each of the tree will cast out a vote based on its own
classification and the votes will be tallied per class. The class with the highest
number of votes will be considered the class for that specific observation. This
process is done for all observations included in the analysis. Lastly, the
Random Forest is assessed based on its capacity to predict the validation data.
(Fawagreh et al., 2014; Gromping, 2009, Breiman, 2001, Breiman, 1996).
The development of the Random Forest in this study was done using the
command randomForest in R. The number of Classification Trees M was set at
2,000 which is double the minimum value of 1,000 (Izenman, 2008). The
number of choice predictors per node mtry is 2 which were computed as
0.5√ where n is the number of predictors (Izenman, 2008). Similar to MLR, RF
was also run for the final CART predictors to assess if the predictor
relationships will be aligned with the final CART model. In this case mtry is set
at 2 which is the minimum.
3. Result
Illustration 1 shows the Final Classification Tree after applying = 30,
the pruning method, and the 1-standard error rule. It was shown that Unit-
linked products are often purchased with life insurance coverages that are
higher than P285,000. From this result, it can be deduced that Unit-linked is
the preferred class for clients who would like to have large protection. Also,
for those plans with life insurance coverage that are higher than P1,130,000, it
is only purchased by the client for himself if his income is more than P95,000,
otherwise, the plan is purchased for him by his extended or immediate family.
349 | I S I W S C 2 0 1 9