Page 177 - Special Topic Session (STS) - Volume 3
P. 177

STS538 Pedro Luis do N. S. et al.
            needed  for  the  CPI  constitutes  a  non-trivial  task.  The  development  of  an
            automatic  classification  system  based  on  machine  learning  techniques  will
            probably be required, such as the ones developed for use with scanner data
            by some countries [ILO, 2004].
                As observed with scanner data sets, high product churn and chain drift
            issues [ILO, 2004] might also affect price indices compiled using the e-records
            if traditional CPI formulas based on the matched model method are adopted.
            To circumvent this problem, unit value formulations or the use of multilateral
            methods  should  be  considered.  Such  methodological  changes  would  have
            impact on the computational routines used to  compile the CPI and would
            require acquisition of knowledge about such methods by the CPI staff.
                Caution  should  also  be  devoted  to  the  separation  of  the  transactions
            contained in the e-records between those made by families and those made
            by businesses, since the target of the CPIs are family’s transactions. Some data
            treatment would be required to restrict the transactions considered to those
            most likely relating to transactions made by families.
                Finally, the IT infrastructure needed to store and process such huge data
            volumes would have to be very robust and scalable. The fiscal authority has
            the storage and processing capabilities to satisfy their own needs, but these
            capabilities  cannot  be  shared  with  the  IBGE.  One  potential  approach  to
            address this issue would be for IBGE to specify sampling procedures to be
            applied to the e-records database on a regular basis, and to receive only the
            specified sample data to use in the CPI compilation. Such an approach would,
            however, depend strongly on gaining some initial access to the e-records in
            order to be able to develop proper sampling procedures.

            4.  Conclusion
                This paper analysed the challenges and opportunities for the adoption of
            alternative big data sources for CPI compilation in Brazil. The focus was on the
            analysis of web data and tax records. The main constraint for the use of web
            data  in  Brazil  and  the  reasoning  for  the  initial  implementation  of  web
            information  at  the  NSCPI  were  discussed.  The  pioneering  project  for  web
            scraping for airfares was briefly discussed showing that the use of automatic
            collection  tools  is  promising,  though  care  need  to  be  taken  for  their
            implementation in the monthly routines of the CPIs.
                The  tax  e-records  was  also  considered,  with  an  assessment  of  their
            potential  and  the  main  technical  problems  that  need  to  be  overcome  for
            adoption and implementation at the NSCPI. The impacts that such a source
            would provide for the expansion of the NSCPI coverage were also considered.




                                                               166 |I S I   W S C   2 0 1 9
   172   173   174   175   176   177   178   179   180   181   182