Page 360 - Special Topic Session (STS) - Volume 2
P. 360

STS500 Neo S.K. et al.
                  instance, in the selected classifier, the word ‘Shop’ had a feature importance
                  score  of  0.044  which  was  more  than  seven  times  the  average  feature
                  importance score of 0.006. This meant that the specific word ‘Shop’ was highly
                  relevant in the classification as compared to a moderately important feature.
                  This allowed a summary insight into the classifier’s predictions. Feature words
                  with notable feature importance are highlighted in Table 3.
                     Finally, the training dataset was used to train the Random Forest Classifier
                  and the classifier was applied on the enterprise URLs to classify them into one
                  of the internet usage categories (B1, B2, C1 or C2).

                                Table 2: Results of algorithms explored

                                          Algorithm            Test Set Accuracy
                                 Random Forest                       79%
                                 Gradient Boosting Machine           77%
                                 Voting Classifier                   77%

                                 Logistic Regression                 72%
                                 Neural Network                      71%
                                 AdaBoost                            70%
                                 Support Vector Machine              68%
                                 Naïve Bayes (Baseline)              57%

                                Table 3: Feature importance of selected words

                                       Feature Words          Feature Importance
                                 Shop                                0.044
                                 Cart                                0.041
                                 Price                               0.027
                                 Facebook                            0.021

                  4.  Results
                     Out of the enterprise URLs obtained to date, 14% have websites which
                  generate income directly online (Figure 2). One caveat to note here is that the
                  enterprises  classified  under  ‘Category  C1/C2:  Income  generated  directly
                  online’ might not generate their income wholly through online means and the
                  online platform could be one of many different revenue streams.






                                                                     349 | I S I   W S C   2 0 1 9
   355   356   357   358   359   360   361   362   363   364   365