Page 12 - Contributed Paper Session (CPS) - Volume 6
P. 12

CPS1465 Claude Macchi et al.


                                   Using machine learning technologies for
                                   coding economic activities of Businesses–
                                             The NOGAuto System
                       Claude Macchi, Michel Chételat, Cindia Duc-Sfez, Christophe Joyon
                                  Swiss Federal Statistical Office, Neuchâtel, Switzerland

                  Abstract
                  Classifications are basic elements for the production of statistics. The quality
                  of the coding of the observed units has a direct impact on the entire data
                  production  process,  on  the  credibility  and  on  the  quality  of  the  statistical
                  outcome.  This  is  even  more  important  in  the  context  of  register  and
                  administrative data, which are the starting point of countless statistics, the
                  base of sample frames and of data analysis.
                  With a view to a continuously improving the quality of the coding of units in
                  the  Swiss  statistical  business  register  (SBER)  as  well  as  to  decreasing  the
                  burden  of  businesses  in  their  obligations  to  deliver  information  to  the
                  statistical offices, the Swiss Federal Statistical Office (FSO) launched early 2018
                  a  project  to  automatise  the  attribution  of  the  economy  activity  code  to
                  businesses.  This  project  is  one  of  the  five  projects  currently
                  being  developed  in  line  with  the  FSO’s  data  innovation  strategy
                  (https://www.bfs.admin.ch/bfs/en/home/news/whats-new.assetdetail.386224
                  0.html)  with  the  goal  to  argument  and/or  complement  the  existing  basic
                  official statistical production at the FSO.
                  The coding procedures are currently quite standardised. The encoders analyse
                  and interpret information on the businesses activities such as inputs from the
                  businesses  themselves,  from  surveys  as  well  as  descriptions  in  company
                  registers and different administrative data. Based on this they define keywords
                  that are compared with a list of keywords- and concepts, linked to each of the
                  positions of the classification and their related explanatory notes, and select
                  on this way the code to be attributed to the observed business.
                  Using innovative new ways, the FSO is building a machine learning system to
                  automatise  the  manual  coding  procedure.  This  artificial  intelligence
                  undertakes  the  reading  and  interpretation  steps  from  the  coder  and
                  automatically associates the business to a classification code. In addition, it
                  proposes new keywords and concepts. It is actually learning from an existing
                  dataset that has already been tested manually. In a sequent step, the horizon
                  of the system could be enlarged by looking directly on the web for additional
                  sources of information on the business to be coded.

                  Keywords
                  Machine Learning; Coding; Classifications; Key Words; Artificial Intelligence


                                                                       1 | I S I   W S C   2 0 1 9
   7   8   9   10   11   12   13   14   15   16   17