Page 74 - Invited Paper Session (IPS) - Volume 1
P. 74

IPS57 Gerardo L. et al.
                  necessary for the classification.  This allowed us to identify clusters of tweets
                  based on chains of q-grams of different orders derived from each tweet.  Since
                  we trained the computer with a set of tweets tagged by humans, we were able
                  to use SVM to estimate the multidimensional hyperplanes that best separate
                  tweets that we knew were positive from those that we knew were negative.
                      Using  a  set  of  tweets  classified  by  humans  that  were  not  used  in  the
                  training of the machine, it was possible to establish that such assembly of
                  classifiers allows the computer to properly  classify 80 of each 100 tweets,
                  which is a percentage of particularly high success among experiences in the
                  field of sentiment analysis.  Once with the computer properly trained to classify
                  tweets, INEGI was given to the task of exploiting the results and presenting
                  them.  It should be noted that although the information published on Twitter
                  is public, INEGI only reports aggregate data and at no time reports nominative
                  or individualized tweets.  Even in the tweets classification system (Pioanálisis)
                  the  tweets  were  anonymized  before  they  were  presented  to  the  human
                  taggers.      So,  every  day  INEGI  reports  the  tweets  from  the  day  before,
                  automatically classified by the computer.  To do this, the computer simply uses
                  the 31 SVM hyperplanes previously estimated and decides the positivity or
                  negativity of each tweet according to a majority report rule.  The following
                  image summarizes the procedure:



























                     Besides generating the automatic classification of tweets, one additional
                  challenge  was  to  present  the  results  in  an  interactive,  agile,  friendly  and
                  attractive setting.  So, through the web page of INEGI we presented a tool with
                  the following characteristics:




                                                                     63 | I S I   W S C   2 0 1 9
   69   70   71   72   73   74   75   76   77   78   79