Page 134 - Invited Paper Session (IPS) - Volume 2
P. 134

IPS188 Bruno Tissot
                  information collected, being very granular, can more easily be matched with
                  other datasets, say census survey-based information for similar homes and tax
                  registries.  Furthermore,  the  new  type  of  information  collected  can  provide
                  insights that are not covered by “traditional” statistics, eg to analyse housing
                  market  liquidity  and  tightness  (by assessing  demand  intensity  through the
                  number of clicks on specific ads), discounting practices (by comparing asking
                  and transactions prices, which can differ markedly for instance during turning
                  points),  and  detailed  geographical  factors  –  see  for  instance  Loberto  et  al
                  (2018).

                  3.  Challenges
                      Despite the various opportunities provided by big data sources, there are
                  important  challenges  in  using  this  information  when  measuring  and
                  forecasting prices. First, there are practical difficulties in collecting the data.
                  This  challenge  can  be  reinforced  by  the  large  variety  of  big  data  formats,
                  especially when the information collected is not well structured. Apart from
                  the technical aspects (eg proper IT equipment, access rights etc), a key issue is
                  data quality. For instance, references displayed on the web can be incorrect,
                  or may not really reflect true transaction prices (eg in case customers benefit
                  from  discounts  for  other  services,  loyalty  programs  etc.);  and  the
                  characteristics of the products may not be standardised properly. As a result,
                  statisticians have to deal with duplicated information, since the same product
                  may be sold in different places but is identified with different characteristics –
                  for instance, a common feature for property markets is that several (different)
                  advertisements  can  be  associated  with  the  same  dwelling.  Alternatively,  a
                  product may still be displayed on a website even though it is no more available
                  for  sale,  hence  the  risk  of  measuring  outdated  prices.  Dealing  with  these
                  challenges requires significant work when cleaning and processing the data.
                  In addition, the usefulness of the information collected is limited if the data
                  sources and/or their market coverage change over time, and of course if its
                  access is hindered by privacy laws and/or copyright issues.
                      There  are  also  important  methodological  limitations. First,  estimating
                  price indices requires defining a basket of goods that are representative of the
                  spending of the economic agents considered. As regards CPI, for instance, a
                  significant part of the consumption basket is related to goods that are either
                  not traded (eg self-consumption of housing services by homeowners) or that
                  have an administrative nature and are therefore not quoted on the internet.
                  So compiling a CPI indice using only web-based information will not be fully
                  representative; one way to go is to complement this approach with other type
                  of (non web-based) information.
                      Even if one only focusses on the part of the consumption basket that can
                  indeed be traded on the internet, another concern is that big data samples


                                                                     121 | I S I   W S C   2 0 1 9
   129   130   131   132   133   134   135   136   137   138   139