Page 135 - Invited Paper Session (IPS) - Volume 2
P. 135

IPS188 Bruno Tissot
            are  often  far  from  representative,  so  the  veracity  of  the  information
            collected may not be as good as it seems. Certainly, big data sets usually cover
            entire populations, so by construction there is little sampling error to correct
            for. But a common misperception is that, because big data sets are extremely
            large, they are automatically representative of the true population of interest.
            Yet this is not guaranteed, and in fact the composition bias can be significant,
            in particular as compared with much smaller traditional probabilistic samples
            (Meng (2014)). For example, when measuring prices online, one must realise
            that not all transactions are conducted on the internet. The measurement bias
            can be problematic if online prices are different from the prices observed in
            physical stores, or if the products bought online are different to those sold in
            shops.
                Lastly, there are also challenges when  using  big data sources. Ideally,
            statistics based on big data should have the same quality of standards and
                                                     6
            frameworks that govern official statistics,  such as transparency of sources,
            methodology, reliability and consistency over time. But in practice they can be
            collected  in  an  opaque  way,  arguably  not  in  line  with  these  recognised
            principles. “Misusing” such information could thus raise ethical, reputational
            as  well  as  efficiency  issues.  In  particular,  if  the  confidentiality  of  the  data
            analysed is not carefully protected, this could undermine public confidence, in
            turn calling into question the authorities’ competence in collecting, processing
            and disseminating information derived from big data (Tissot (2019)).
                Using big data for anticipating future developments is also challenging.
            While related applications such as machine learning algorithms can excel in
            terms of predictive performance, they can lend themselves more to explaining
            what is happening rather then why. Indeed big data analytics rely frequently
            on  correlation  analysis,  which  can  reflect  coincidence  as  well  as  causality
            patterns.  As  such,  they  may  be  exposed  to  public  criticism  when  insights
            gained in this way are used to produce official statistics and forecasts and/or
            justify policy decisions.

            4.  Conclusion
                Big data sources and techniques can provide new and useful insights that
            complement  “traditional”  data  sets  and  facilitate  the  compilation  of  price
            statistics as well as (short-term) forecasting exercises. They can also provide
            new sorts of signals that can be useful especially for policy makers, for instance
            for analysing market liquidity as well as geographical patterns.


              Cf the Fundamental Principles of Official Statistics adopted by the United Nations in
            6
            2014, available at unstats.un.org/unsd/dnss/gp/FP-New-E.pdf. For an overview of the
            challenges posed by using big data for official statistics more generally, see C Hammer
            et al (2017).
                                                               122 | I S I   W S C   2 0 1 9
   130   131   132   133   134   135   136   137   138   139   140