Page 72 - Invited Paper Session (IPS) - Volume 1
P. 72

IPS57 Gerardo L. et al.
                  1.  Introduction
                      Official Statistics can be seen as a building supported by 3 pillars: censuses,
                  surveys,  and  administrative  registers.    The  so-called  “data  deluge”,  a  by-
                  product of the digital revolution, has generated the possibility of counting
                  with  a  fourth  pillar:    the  so-called  “big  data”.    We  still  don’t  know  how
                  important big data will become for the production of official statistics, relative
                  to the other pillars, but now we can see that it is, at least, very promising.  Even
                  its more sceptical critics now accept that big data should play a role in the
                  production of official statistics.  Tough, problems of practical implementation
                  remain; one of them being the access to the sources of information, which
                  frequently  is  intermediated  by  enterprises  whose  main  purpose  is  not
                  necessarily the supply or generation of information for the common good.
                      Big data is frequently associated with the three “V”:  Volume, Variety, and
                  Velocity.  Two additional “V”s have been added more recently:  Veracity and
                  Value.  All these “V”s are of course valid elements for the characterizations of
                  a slippery concept, but we think they tend to put most of the weight on data
                  itself rather than on the way it is used.  Even if big data emerged from a series
                  of technological innovations, its essence is more on the side of the way data
                  is approached.  So we can even think about a big data paradigm, according to
                  which big data can be seen as a flexible approach to use and re-use the totality
                  of a data set, structured or not, in a diversity of possible purposes, normally
                  different to those that originated the information set in the first place.
                      It´s clear that extensive use of the term “big data” may turn it into a buzz
                  concept, which is precisely what Dan Ariely captures in his famous cynical anti-
                  definition:  “Big data is like teenage sex: everyone talks about it, nobody really
                  knows how to do it, everyone thinks everyone else is doing it, so everyone
                  claims they are doing it...”.  But beyond definitions, the fact is that more and
                  more people around the world are getting familiar with what big data is, and
                  many more are participating in its generation or simply using it.  National
                  Statistical Offices are obliged to look for ways to incorporate big data sources
                  in  their  supply of  information,  for  more  than one  reason:    it  is cheap  and
                  normally comes in high frequency and granularity. Among the many ways to
                  engage with the possibilities involved in big data, this paper shows the way
                  followed  by  the  National  Institute  of  Statistics  and  Geography  (INEGI)  to
                  exploit  Twitter  in  order  to  generate  a  web  service  reporting  the  mood  of
                  tweeterers in México on a daily basis.
                      So, INEGI can also claim to be doing it.  It has decided to venture into the
                  world of "big data" to explore the usefulness of non-traditional sources of
                  information  in  order  to  link  them  with  the  generation  of  statistical  and
                  geographic  information.    As  the  first  step  in  this  direction,  INEGI  has
                  undertaken in an experimental way the "sentiment analysis" by means of the



                                                                     61 | I S I   W S C   2 0 1 9
   67   68   69   70   71   72   73   74   75   76   77