Page 84 - Contributed Paper Session (CPS) - Volume 5
P. 84

CPS1060 Taik Guan T. et al.
                  b.  Machine Learning
                      In this stage, the World Bank’s data and GA generated data are used to
                  train the SVM. This paper proposes to use LIBSVM as the target SVM to be
                  trained  for  generalization.  The  SVM  will  learn  from  the  training  samples
                  collected from the World Bank databank and generated by GA.
                  There are two steps in the machine learning process:
                      a.  Splitting data for training and test: The data is divided into different
                         proportions for machine training and testing.
                      b.  Training and Validating SVM: The training datasets are further divided
                         into three different scales of cross-validation to validate the accuracy
                         of the training.
                  c.  Machine Testing
                      The testing data sets are provided to the SVM that have been trained using
                  different machine kernels such as linear, polynomial, and RBF. The testing data
                  sets are unseen to the SVM. The SVM with different kernels will classify the
                  data sets, and the accuracy of the classification results will be observed. The
                  testing accuracy represents the SVM’s capabilities of recognizing the pattern
                  of  unseen  data  and  classifying  the  data  into  either  class  +1  (having
                  socioeconomic potential)  or  -1 (not having socioeconomic potential). If  an
                  SVM can perform accurately in the samples given, it is assumed that the SVM
                  can also perform accurately  outside the samples (e.g., real-life  data that is
                  unknown to the SVM).
                  d.  Machine Application
                      Finally, real-life field data sets from DOSM for states and federal territories
                  in Malaysia are provided to the SVMs that have been tested. The SVMs will
                  classify the real-field data sets, and the results will be observed.

                  3.  Result
                      There are 20 PESTEL features (including 1 response) concluded for use in
                  research with complete data of 174 countries being extracted from the World
                  Bank databanks of World Development Indicators online. Countries of high
                  income  and  upper  middle  income  are  labelled  as  having  socioeconomic
                  potential (+1); countries of low income and lower middle income are labeled
                  as not having socioeconomic potential (-1). In response to GNI per capita, it
                  has  been  observed  that  GDP  per  capita  has  the  highest-impact  (0.981)
                  directional  relationship,  followed  by  broadband  penetration.  The  fixed
                  telephony  penetration  and  life  expectancy  have  a  relatively  moderate  co-
                  relationship impact with ratings above 0.500. The birth rate also has a relatively
                  moderate co-relationship impact to GNI per capita but in the reverse direction.
                  The features of the length of tar road, tourism activity, GDP, GNI and % of the
                  population  with  electricity  access  have  some  co-relationship  impact,  with
                  impact ratings ranging between 0.271 and 0.454.

                                                                      73 | I S I   W S C   2 0 1 9
   79   80   81   82   83   84   85   86   87   88   89