Page 282 - Special Topic Session (STS) - Volume 2
P. 282
STS493 Stéphane D. et al.
Inspection Agency will benefit by having a framework on disease predictability
to help monitor potential outbreaks of disease, such as African swine fever.
Exploring web scraping as a new mode of collection
Currently, Statistics Canada’s Annual Survey of Manufacturing and Logging
Industries uses available information to prefill components of the annual
questionnaires to facilitate reporting of the commodities being produced, as
well as the sales amount. The commodity data are prone to non-response and
reporting errors but are crucial for measuring economic production in Canada.
A pilot project will investigate the use of web scraping technology to collect
website information to get a better picture of the types of commodities
manufactured and to potentially improve sales estimates. A generic web
scraper will be used to collect text data, which will be transformed into a list of
commodities from each company’s website. This information will then be used
to prefill the questionnaires described above and improve the high non-
response. In addition, the data will be used to improve auto-coding rates and
quality.
Automated web scraping involves important ethical and privacy issues for
national statistical offices. In response, Statistics Canada is developing a
directive on web scraping to establish recommendations about using API when
available, consulting the robots.txt of websites, respecting website controls and
protocols, not capturing personal information, being transparent, and
addressing other issues. All of this work will be done in close collaboration with
the Office of the Privacy Commissioner.
Statistics Canada is also exploring web scraping from news platforms to
collect information that can support enterprise profiling, financial variable
coherence analysis, merger and acquisition event detection, and sentiment
indicators for measuring business tendencies based on news analytics.
4. Conclusion
While it may be a challenge for national statistical offices to collect data
with response rates declining around the world, Statistics Canada has
implemented changes to stabilize response rates and adapt to respondents’
preferences in terms of collection or contact mode. In addition, Statistics
Canada feels that the new tools and techniques that are now available show
great promise in supporting its work. These changes and techniques were
summarized in this paper. Statistics Canada is happy to partner with other
national statistical offices interested in experimenting with and developing
such solutions, to improve the future of data acquisition.
271 | I S I W S C 2 0 1 9