Page 74 - Invited Paper Session (IPS) - Volume 1
P. 74
IPS57 Gerardo L. et al.
necessary for the classification. This allowed us to identify clusters of tweets
based on chains of q-grams of different orders derived from each tweet. Since
we trained the computer with a set of tweets tagged by humans, we were able
to use SVM to estimate the multidimensional hyperplanes that best separate
tweets that we knew were positive from those that we knew were negative.
Using a set of tweets classified by humans that were not used in the
training of the machine, it was possible to establish that such assembly of
classifiers allows the computer to properly classify 80 of each 100 tweets,
which is a percentage of particularly high success among experiences in the
field of sentiment analysis. Once with the computer properly trained to classify
tweets, INEGI was given to the task of exploiting the results and presenting
them. It should be noted that although the information published on Twitter
is public, INEGI only reports aggregate data and at no time reports nominative
or individualized tweets. Even in the tweets classification system (Pioanálisis)
the tweets were anonymized before they were presented to the human
taggers. So, every day INEGI reports the tweets from the day before,
automatically classified by the computer. To do this, the computer simply uses
the 31 SVM hyperplanes previously estimated and decides the positivity or
negativity of each tweet according to a majority report rule. The following
image summarizes the procedure:
Besides generating the automatic classification of tweets, one additional
challenge was to present the results in an interactive, agile, friendly and
attractive setting. So, through the web page of INEGI we presented a tool with
the following characteristics:
63 | I S I W S C 2 0 1 9