Page 132 - Invited Paper Session (IPS) - Volume 2
P. 132
IPS188 Bruno Tissot
of “big data analytics” – broadly referring to the general analysis of large data
sets – and “artificial intelligence” (AI); cf IFC (2019). Modern computing tools
can now be used to collect data, correct them, improve coverage (eg web-
scraping), process textual information (text-mining), match different data
sources (eg fuzzy-matching), extract relevant information (eg machine
learning) and communicate or display pertinent indicators (eg interactive
dashboards). All these elements can help to address the resource issues posed
by the compilation of official price statistics, especially in developing
economies where statistical systems are still in infancy and staff skills are
limited. One example is the Billion Prices Project at the Massachusetts Institute
of Technology (MIT), which allows inflation indices to be constructed for
countries that lack an official and/or comprehensive index and that can be
used for enhancing international comparisons of price indexes in multiple
countries and for dealing with measurement biases and distortions in
international relative prices (see www.thebillionpricesproject.com and Cavallo
and Rigobon (2016)). Similarly, a number of central banks in emerging market
economies have compiled quick price estimates for selected goods and
properties, by directly scraping the information displayed on the web, instead
of setting up specific surveys that can be quite time- and resource-intensive.
One notable situation relates to those developing economies as big as India,
where collecting internet-based data is seen as a potentially useful alternative
to the organisation of large surveys that would have to cover millions of
5
reporters. Yet, as indeed noted by Hill (2018) in the case of the United States,
and in contrast to what is observed in the research and academic community,
the use of big data in more mature statistical systems has been relatively
incremental and limited. It is often targeted at methodological improvements
(for instance quality adjustment) and at reducing reporting lags.
Turning to the measurement challenges posed by rapid innovation, the
high velocity of big data sources can be particularly useful when prices change
rapidly. For instance, direct web-scraping allows extracting almost in real time
retailers’ prices from online advertisements. This can support a timelier
publication of official data, by bridging the time lags before official statistics
become available – ie through the compilation of advance estimates or
“nowcasting exercises”. In addition to the lag issue, the information provided
by the wide range of web and electronic devices is often available with a higher
frequency; changes in price developments can thus be tracked more promptly,
compared to official CPI numbers that are usually available on a monthly basis.
This can be particularly useful when analysing early warning indicators and
5 “… while nearly 15% of the price quotes in the Consumer Price Index are now collected online
(…) the size of the CPI sample has not increased to reflect the lower cost of online data
collection”.
119 | I S I W S C 2 0 1 9