Page 177 - Special Topic Session (STS) - Volume 3
P. 177
STS538 Pedro Luis do N. S. et al.
needed for the CPI constitutes a non-trivial task. The development of an
automatic classification system based on machine learning techniques will
probably be required, such as the ones developed for use with scanner data
by some countries [ILO, 2004].
As observed with scanner data sets, high product churn and chain drift
issues [ILO, 2004] might also affect price indices compiled using the e-records
if traditional CPI formulas based on the matched model method are adopted.
To circumvent this problem, unit value formulations or the use of multilateral
methods should be considered. Such methodological changes would have
impact on the computational routines used to compile the CPI and would
require acquisition of knowledge about such methods by the CPI staff.
Caution should also be devoted to the separation of the transactions
contained in the e-records between those made by families and those made
by businesses, since the target of the CPIs are family’s transactions. Some data
treatment would be required to restrict the transactions considered to those
most likely relating to transactions made by families.
Finally, the IT infrastructure needed to store and process such huge data
volumes would have to be very robust and scalable. The fiscal authority has
the storage and processing capabilities to satisfy their own needs, but these
capabilities cannot be shared with the IBGE. One potential approach to
address this issue would be for IBGE to specify sampling procedures to be
applied to the e-records database on a regular basis, and to receive only the
specified sample data to use in the CPI compilation. Such an approach would,
however, depend strongly on gaining some initial access to the e-records in
order to be able to develop proper sampling procedures.
4. Conclusion
This paper analysed the challenges and opportunities for the adoption of
alternative big data sources for CPI compilation in Brazil. The focus was on the
analysis of web data and tax records. The main constraint for the use of web
data in Brazil and the reasoning for the initial implementation of web
information at the NSCPI were discussed. The pioneering project for web
scraping for airfares was briefly discussed showing that the use of automatic
collection tools is promising, though care need to be taken for their
implementation in the monthly routines of the CPIs.
The tax e-records was also considered, with an assessment of their
potential and the main technical problems that need to be overcome for
adoption and implementation at the NSCPI. The impacts that such a source
would provide for the expansion of the NSCPI coverage were also considered.
166 |I S I W S C 2 0 1 9