Page 174 - Special Topic Session (STS) - Volume 3
P. 174

STS538 Pedro Luis do N. S. et al.
                  manual collection on the web is already the norm, such as for airline ticket
                  prices.  The  other  experiment  under  way  relies  on  the  use  of  web  data  to
                  expand  the  amount  of  information  collected  for  improvement  of  the  CPIs
                  compiled. A pioneer approach in this area is the collection of web data to
                  support implementation of hedonic quality adjustment methods in the NSCPI.
                  The use of web data to expand the number of elements in the CPI basket and
                  the compilation of an experimental web-only CPI are also under consideration
                  for a future stage of this project.
                     We briefly discuss only the first case here. For more details we refer to [da
                  Silva et al., 2019]. A pilot project on the use of web data is ongoing for the
                  collection  of  prices  of  airfares.  Since  nowadays  airfares  are  mostly  traded
                  online, the traditional collection consists in having price collectors visit the web
                  sites  of  the  main  airline  companies  on  pre-established  dates,  and  then
                  recording the prices observed for pre-determined routes. This process is time
                  consuming and subject to human  error: data collection takes  about 2 − 4
                  hours per week, per area, and extra time is demanded for the head office team
                  to check if the prices collected are consistent. The pilot project aims for the
                  replacement of the manual collection by automatic web scraping routines. A
                  home-maid web scraper was developed to automate the current collection
                  process: scraping once a week for the same routes on the same airlines.
                     The pilot is running for almost a year now and the results are promising,
                  as summarized in Figure 1, that compares an experimental weekly index built
                  from the manual and automatic price data collections [da Silva et al., 2019].
                  Figure 1 shows good agreement between the manual and automatic series.
                  However, before implementation of the automated solution in the monthly
                  production process of the CPIs, some extra care is needed. The main problems
                  observed in the pilot were: dealing with anti-robot policies - one airline has
                  blocked the web scraper for a while; website architectures may change without
                  warning, thus requiring adjustments in the web scraper to ensure that the
                  correct information is extracted; and network instabilities during some special
                  sales events have disturbed the robot’s ability to retrieve prices [da Silva et al.,
                  2019].


















                                                                     163 |I S I   W S C   2 0 1 9
   169   170   171   172   173   174   175   176   177   178   179