Page 135 - Invited Paper Session (IPS) - Volume 2
P. 135
IPS188 Bruno Tissot
are often far from representative, so the veracity of the information
collected may not be as good as it seems. Certainly, big data sets usually cover
entire populations, so by construction there is little sampling error to correct
for. But a common misperception is that, because big data sets are extremely
large, they are automatically representative of the true population of interest.
Yet this is not guaranteed, and in fact the composition bias can be significant,
in particular as compared with much smaller traditional probabilistic samples
(Meng (2014)). For example, when measuring prices online, one must realise
that not all transactions are conducted on the internet. The measurement bias
can be problematic if online prices are different from the prices observed in
physical stores, or if the products bought online are different to those sold in
shops.
Lastly, there are also challenges when using big data sources. Ideally,
statistics based on big data should have the same quality of standards and
6
frameworks that govern official statistics, such as transparency of sources,
methodology, reliability and consistency over time. But in practice they can be
collected in an opaque way, arguably not in line with these recognised
principles. “Misusing” such information could thus raise ethical, reputational
as well as efficiency issues. In particular, if the confidentiality of the data
analysed is not carefully protected, this could undermine public confidence, in
turn calling into question the authorities’ competence in collecting, processing
and disseminating information derived from big data (Tissot (2019)).
Using big data for anticipating future developments is also challenging.
While related applications such as machine learning algorithms can excel in
terms of predictive performance, they can lend themselves more to explaining
what is happening rather then why. Indeed big data analytics rely frequently
on correlation analysis, which can reflect coincidence as well as causality
patterns. As such, they may be exposed to public criticism when insights
gained in this way are used to produce official statistics and forecasts and/or
justify policy decisions.
4. Conclusion
Big data sources and techniques can provide new and useful insights that
complement “traditional” data sets and facilitate the compilation of price
statistics as well as (short-term) forecasting exercises. They can also provide
new sorts of signals that can be useful especially for policy makers, for instance
for analysing market liquidity as well as geographical patterns.
Cf the Fundamental Principles of Official Statistics adopted by the United Nations in
6
2014, available at unstats.un.org/unsd/dnss/gp/FP-New-E.pdf. For an overview of the
challenges posed by using big data for official statistics more generally, see C Hammer
et al (2017).
122 | I S I W S C 2 0 1 9