Page 87 - Contributed Paper Session (CPS) - Volume 5
P. 87

CPS1060 Taik Guan T. et al.
            PESTEL  features  selected  in  this  paper  for  experiments,  12  features
            demonstrated some impact or high impact to broadband development. These
            features are GDP per capita, fixed broadband penetration, wireless broadband
            penetration, telephony penetration, life expectancy, length of roads, economic
            activity, GDP, GNI, electricity access, population density, and birth rate. This
            observation is in line with findings in past research. However, the impact of
            the labor force and secondary education are found to be in contrary with the
            literature review. As broadband technology is evolving, the tertiary education
            might be a better indicator than secondary education. Goldfarb (2006) found
            that  university  education  improved  the  diffusion  of  the  Internet.  The  low
            correlation coefficient for labour force might open up an area of new research
            or  further  literature  review  if  the  occupation  is  a  better  feature  to  replace
            labour  force.  Land  size,  agricultural  land  size,  population  size,  rainfall  and
            average  temperature  are  found  to  have  minimum  impact  on  broadband
            development. The literature reviewed does not reveal the correlation between
            these five features against broadband development. The research experiment
            shows that % of agricultural land has a high impact as compared to land size
            or agricultural land by itself.
                It is concluded that the machine learning technique is a feasible model for
            use in the telecom industry to classify geographic areas according to their
            socioeconomic  potential.  Training  data  are  available  from  the  World  Bank
            databank and the Department of Statistics Malaysia to initiate the process of
            machine  learning.  Even  though  there  are  shortcomings  in  the  data  sets
            regarding feature sets and sample size, the existing data are good enough to
            be  used as  prototyping data  to  be  put  through  statistical modeling  which
            results  to  the  formulation  of  interdependencies  (correlation  coefficient)
            among the features and targeted response. The statistical modeling has been
            successful  in  generalizing  the  data  and  screening  for  important  factors  to
            establish the optimal product formulation, which is an equation that correlates
            the geographical features corresponding to the socioeconomic response. By
            applying the equation to a genetic algorithm, virtual samples in a large size
            have been generated for SVM training and testing. The high accuracy achieved
            in  cross-validation  and  testing  are  good  evidence  that  the  SVM  has  been
            properly trained. Finally, when real-life field data for states in Malaysia are
            provided  to  the  SVM,  the  machine  can  successfully  classify  the  states
            according to their socioeconomic potential.
                The research results show that the land size and population size have a low
            co-relationship impact to GNI per capita and fixed broadband penetration,
            thus, this machine learning model can be applied to classify countries, states,
            urban and rural areas. Using a machine learning technique (MLT) to classify
            the  socioeconomic  potential  of  a  geographic  area  according  to  its
            geographical features is a novelty of this research. The MLT is relatively more

                                                                76 | I S I   W S C   2 0 1 9
   82   83   84   85   86   87   88   89   90   91   92