Page 174 - Special Topic Session (STS) - Volume 3
P. 174
STS538 Pedro Luis do N. S. et al.
manual collection on the web is already the norm, such as for airline ticket
prices. The other experiment under way relies on the use of web data to
expand the amount of information collected for improvement of the CPIs
compiled. A pioneer approach in this area is the collection of web data to
support implementation of hedonic quality adjustment methods in the NSCPI.
The use of web data to expand the number of elements in the CPI basket and
the compilation of an experimental web-only CPI are also under consideration
for a future stage of this project.
We briefly discuss only the first case here. For more details we refer to [da
Silva et al., 2019]. A pilot project on the use of web data is ongoing for the
collection of prices of airfares. Since nowadays airfares are mostly traded
online, the traditional collection consists in having price collectors visit the web
sites of the main airline companies on pre-established dates, and then
recording the prices observed for pre-determined routes. This process is time
consuming and subject to human error: data collection takes about 2 − 4
hours per week, per area, and extra time is demanded for the head office team
to check if the prices collected are consistent. The pilot project aims for the
replacement of the manual collection by automatic web scraping routines. A
home-maid web scraper was developed to automate the current collection
process: scraping once a week for the same routes on the same airlines.
The pilot is running for almost a year now and the results are promising,
as summarized in Figure 1, that compares an experimental weekly index built
from the manual and automatic price data collections [da Silva et al., 2019].
Figure 1 shows good agreement between the manual and automatic series.
However, before implementation of the automated solution in the monthly
production process of the CPIs, some extra care is needed. The main problems
observed in the pilot were: dealing with anti-robot policies - one airline has
blocked the web scraper for a while; website architectures may change without
warning, thus requiring adjustments in the web scraper to ensure that the
correct information is extracted; and network instabilities during some special
sales events have disturbed the robot’s ability to retrieve prices [da Silva et al.,
2019].
163 |I S I W S C 2 0 1 9