Page 84 - Contributed Paper Session (CPS) - Volume 5
P. 84
CPS1060 Taik Guan T. et al.
b. Machine Learning
In this stage, the World Bank’s data and GA generated data are used to
train the SVM. This paper proposes to use LIBSVM as the target SVM to be
trained for generalization. The SVM will learn from the training samples
collected from the World Bank databank and generated by GA.
There are two steps in the machine learning process:
a. Splitting data for training and test: The data is divided into different
proportions for machine training and testing.
b. Training and Validating SVM: The training datasets are further divided
into three different scales of cross-validation to validate the accuracy
of the training.
c. Machine Testing
The testing data sets are provided to the SVM that have been trained using
different machine kernels such as linear, polynomial, and RBF. The testing data
sets are unseen to the SVM. The SVM with different kernels will classify the
data sets, and the accuracy of the classification results will be observed. The
testing accuracy represents the SVM’s capabilities of recognizing the pattern
of unseen data and classifying the data into either class +1 (having
socioeconomic potential) or -1 (not having socioeconomic potential). If an
SVM can perform accurately in the samples given, it is assumed that the SVM
can also perform accurately outside the samples (e.g., real-life data that is
unknown to the SVM).
d. Machine Application
Finally, real-life field data sets from DOSM for states and federal territories
in Malaysia are provided to the SVMs that have been tested. The SVMs will
classify the real-field data sets, and the results will be observed.
3. Result
There are 20 PESTEL features (including 1 response) concluded for use in
research with complete data of 174 countries being extracted from the World
Bank databanks of World Development Indicators online. Countries of high
income and upper middle income are labelled as having socioeconomic
potential (+1); countries of low income and lower middle income are labeled
as not having socioeconomic potential (-1). In response to GNI per capita, it
has been observed that GDP per capita has the highest-impact (0.981)
directional relationship, followed by broadband penetration. The fixed
telephony penetration and life expectancy have a relatively moderate co-
relationship impact with ratings above 0.500. The birth rate also has a relatively
moderate co-relationship impact to GNI per capita but in the reverse direction.
The features of the length of tar road, tourism activity, GDP, GNI and % of the
population with electricity access have some co-relationship impact, with
impact ratings ranging between 0.271 and 0.454.
73 | I S I W S C 2 0 1 9