Page 360 - Contributed Paper Session (CPS) - Volume 6
P. 360

CPS1966 Jessa L. S. C. et al.
                  shows how to compute for the log-likelihood that class q’ will be chosen over
                  Unit Link given the set of predictor variables x. (Hosmer and Lemeshow, 2000)
                      Fitting the model is equivalent to estimating the value of the vector B in
                  the  equation  above  such  that  the  deviations  of  predicted  values  from  the
                  observed values are minimized. This was  done by applying the method of
                  maximum likelihood which is applied in R using the multinom function. To
                  check how the results align with CART, MLR was also run to consider only the
                  final predictors from CART. The predictors were transformed to factors. To
                  assess  the  significance  of  a  predictor  the  Wald’s  2-Tailed  Z-Test  was
                  conducted in R. The goodness-of-fit test was implemented using the Hosmer-
                  Lemeshow Goodness-of-fit test through the logitgof function in R (Jay, 2017).
                      As an ensemble, Random Forest is a method that involves a collection of
                  Classification  Trees.  Constructing  the  ensemble  is  an  application  of  the
                  bootstrap  aggregation  or  bagging  method.  The  first  step  is  the  repeated
                  sampling in the observations via bootstrap such that the samples that will be
                  generated will be used to construct the Classification Trees. To generate the
                  overall prediction of the ensemble on the classification of an observation, a
                  voting scheme is implemented to aggregate the classification outcomes of the
                  Classification Trees. Each of the tree will cast out a vote based on its own
                  classification and the votes will be tallied per class. The class with the highest
                  number of votes will be considered the class for that specific observation. This
                  process  is  done  for  all  observations  included  in  the  analysis.  Lastly,  the
                  Random Forest is assessed based on its capacity to predict the validation data.
                  (Fawagreh et al., 2014; Gromping, 2009, Breiman, 2001, Breiman, 1996).
                      The development of the Random Forest in this study was done using the
                  command randomForest in R. The number of Classification Trees M was set at
                  2,000  which  is  double  the  minimum  value  of  1,000  (Izenman,  2008).  The
                  number  of  choice  predictors  per  node  mtry  is  2  which  were  computed  as
                  0.5√ where n is the number of predictors (Izenman, 2008). Similar to MLR, RF
                  was  also  run  for  the  final  CART  predictors  to  assess  if  the  predictor
                  relationships will be aligned with the final CART model. In this case mtry is set
                  at 2 which is the minimum.

                  3.  Result
                      Illustration 1 shows the Final Classification Tree after applying    = 30,
                  the pruning method, and the 1-standard error rule. It was shown that Unit-
                  linked products are often purchased with life insurance coverages that are
                  higher than P285,000. From this result, it can be deduced that Unit-linked is
                  the preferred class for clients who would like to have large protection. Also,
                  for those plans with life insurance coverage that are higher than P1,130,000, it
                  is only purchased by the client for himself if his income is more than P95,000,
                  otherwise, the plan is purchased for him by his extended or immediate family.

                                                                     349 | I S I   W S C   2 0 1 9
   355   356   357   358   359   360   361   362   363   364   365