Page 72 - Invited Paper Session (IPS) - Volume 1
P. 72
IPS57 Gerardo L. et al.
1. Introduction
Official Statistics can be seen as a building supported by 3 pillars: censuses,
surveys, and administrative registers. The so-called “data deluge”, a by-
product of the digital revolution, has generated the possibility of counting
with a fourth pillar: the so-called “big data”. We still don’t know how
important big data will become for the production of official statistics, relative
to the other pillars, but now we can see that it is, at least, very promising. Even
its more sceptical critics now accept that big data should play a role in the
production of official statistics. Tough, problems of practical implementation
remain; one of them being the access to the sources of information, which
frequently is intermediated by enterprises whose main purpose is not
necessarily the supply or generation of information for the common good.
Big data is frequently associated with the three “V”: Volume, Variety, and
Velocity. Two additional “V”s have been added more recently: Veracity and
Value. All these “V”s are of course valid elements for the characterizations of
a slippery concept, but we think they tend to put most of the weight on data
itself rather than on the way it is used. Even if big data emerged from a series
of technological innovations, its essence is more on the side of the way data
is approached. So we can even think about a big data paradigm, according to
which big data can be seen as a flexible approach to use and re-use the totality
of a data set, structured or not, in a diversity of possible purposes, normally
different to those that originated the information set in the first place.
It´s clear that extensive use of the term “big data” may turn it into a buzz
concept, which is precisely what Dan Ariely captures in his famous cynical anti-
definition: “Big data is like teenage sex: everyone talks about it, nobody really
knows how to do it, everyone thinks everyone else is doing it, so everyone
claims they are doing it...”. But beyond definitions, the fact is that more and
more people around the world are getting familiar with what big data is, and
many more are participating in its generation or simply using it. National
Statistical Offices are obliged to look for ways to incorporate big data sources
in their supply of information, for more than one reason: it is cheap and
normally comes in high frequency and granularity. Among the many ways to
engage with the possibilities involved in big data, this paper shows the way
followed by the National Institute of Statistics and Geography (INEGI) to
exploit Twitter in order to generate a web service reporting the mood of
tweeterers in México on a daily basis.
So, INEGI can also claim to be doing it. It has decided to venture into the
world of "big data" to explore the usefulness of non-traditional sources of
information in order to link them with the generation of statistical and
geographic information. As the first step in this direction, INEGI has
undertaken in an experimental way the "sentiment analysis" by means of the
61 | I S I W S C 2 0 1 9