Page 282 - Special Topic Session (STS) - Volume 2
P. 282

STS493 Stéphane D. et al.
                  Inspection Agency will benefit by having a framework on disease predictability
                  to help monitor potential outbreaks of disease, such as African swine fever.

                  Exploring web scraping as a new mode of collection
                     Currently, Statistics Canada’s Annual Survey of Manufacturing and Logging
                  Industries  uses  available  information  to  prefill  components  of  the  annual
                  questionnaires to facilitate reporting of the commodities being produced, as
                  well as the sales amount. The commodity data are prone to non-response and
                  reporting errors but are crucial for measuring economic production in Canada.
                  A pilot project will investigate the use of web scraping technology to collect
                  website  information  to  get  a  better  picture  of  the  types  of  commodities
                  manufactured  and  to  potentially  improve  sales  estimates.  A  generic  web
                  scraper will be used to collect text data, which will be transformed into a list of
                  commodities from each company’s website. This information will then be used
                  to  prefill  the  questionnaires  described  above  and  improve  the  high  non-
                  response. In addition, the data will be used to improve auto-coding rates and
                  quality.
                     Automated web scraping involves important ethical and privacy issues for
                  national  statistical  offices.  In  response,  Statistics  Canada  is  developing  a
                  directive on web scraping to establish recommendations about using API when
                  available, consulting the robots.txt of websites, respecting website controls and
                  protocols,  not  capturing  personal  information,  being  transparent,  and
                  addressing other issues. All of this work will be done in close collaboration with
                  the Office of the Privacy Commissioner.
                     Statistics Canada is also exploring web scraping from news platforms to
                  collect  information  that  can  support  enterprise  profiling,  financial  variable
                  coherence  analysis,  merger  and  acquisition  event  detection,  and  sentiment
                  indicators for measuring business tendencies based on news analytics.

                  4.  Conclusion
                     While it may be a challenge for national statistical offices to collect data
                  with  response  rates  declining  around  the  world,  Statistics  Canada  has
                  implemented changes to stabilize response rates and adapt to respondents’
                  preferences  in  terms  of  collection  or  contact  mode.  In  addition,  Statistics
                  Canada feels that the new tools and techniques that are now available show
                  great  promise  in  supporting  its  work.  These  changes  and  techniques  were
                  summarized  in  this  paper.  Statistics  Canada  is  happy  to  partner  with  other
                  national  statistical  offices  interested  in  experimenting  with  and  developing
                  such solutions, to improve the future of data acquisition.






                                                                     271 | I S I   W S C   2 0 1 9
   277   278   279   280   281   282   283   284   285   286   287