Page 134 - Invited Paper Session (IPS) - Volume 2
P. 134
IPS188 Bruno Tissot
information collected, being very granular, can more easily be matched with
other datasets, say census survey-based information for similar homes and tax
registries. Furthermore, the new type of information collected can provide
insights that are not covered by “traditional” statistics, eg to analyse housing
market liquidity and tightness (by assessing demand intensity through the
number of clicks on specific ads), discounting practices (by comparing asking
and transactions prices, which can differ markedly for instance during turning
points), and detailed geographical factors – see for instance Loberto et al
(2018).
3. Challenges
Despite the various opportunities provided by big data sources, there are
important challenges in using this information when measuring and
forecasting prices. First, there are practical difficulties in collecting the data.
This challenge can be reinforced by the large variety of big data formats,
especially when the information collected is not well structured. Apart from
the technical aspects (eg proper IT equipment, access rights etc), a key issue is
data quality. For instance, references displayed on the web can be incorrect,
or may not really reflect true transaction prices (eg in case customers benefit
from discounts for other services, loyalty programs etc.); and the
characteristics of the products may not be standardised properly. As a result,
statisticians have to deal with duplicated information, since the same product
may be sold in different places but is identified with different characteristics –
for instance, a common feature for property markets is that several (different)
advertisements can be associated with the same dwelling. Alternatively, a
product may still be displayed on a website even though it is no more available
for sale, hence the risk of measuring outdated prices. Dealing with these
challenges requires significant work when cleaning and processing the data.
In addition, the usefulness of the information collected is limited if the data
sources and/or their market coverage change over time, and of course if its
access is hindered by privacy laws and/or copyright issues.
There are also important methodological limitations. First, estimating
price indices requires defining a basket of goods that are representative of the
spending of the economic agents considered. As regards CPI, for instance, a
significant part of the consumption basket is related to goods that are either
not traded (eg self-consumption of housing services by homeowners) or that
have an administrative nature and are therefore not quoted on the internet.
So compiling a CPI indice using only web-based information will not be fully
representative; one way to go is to complement this approach with other type
of (non web-based) information.
Even if one only focusses on the part of the consumption basket that can
indeed be traded on the internet, another concern is that big data samples
121 | I S I W S C 2 0 1 9