Page 12 - Contributed Paper Session (CPS) - Volume 6
P. 12
CPS1465 Claude Macchi et al.
Using machine learning technologies for
coding economic activities of Businesses–
The NOGAuto System
Claude Macchi, Michel Chételat, Cindia Duc-Sfez, Christophe Joyon
Swiss Federal Statistical Office, Neuchâtel, Switzerland
Abstract
Classifications are basic elements for the production of statistics. The quality
of the coding of the observed units has a direct impact on the entire data
production process, on the credibility and on the quality of the statistical
outcome. This is even more important in the context of register and
administrative data, which are the starting point of countless statistics, the
base of sample frames and of data analysis.
With a view to a continuously improving the quality of the coding of units in
the Swiss statistical business register (SBER) as well as to decreasing the
burden of businesses in their obligations to deliver information to the
statistical offices, the Swiss Federal Statistical Office (FSO) launched early 2018
a project to automatise the attribution of the economy activity code to
businesses. This project is one of the five projects currently
being developed in line with the FSO’s data innovation strategy
(https://www.bfs.admin.ch/bfs/en/home/news/whats-new.assetdetail.386224
0.html) with the goal to argument and/or complement the existing basic
official statistical production at the FSO.
The coding procedures are currently quite standardised. The encoders analyse
and interpret information on the businesses activities such as inputs from the
businesses themselves, from surveys as well as descriptions in company
registers and different administrative data. Based on this they define keywords
that are compared with a list of keywords- and concepts, linked to each of the
positions of the classification and their related explanatory notes, and select
on this way the code to be attributed to the observed business.
Using innovative new ways, the FSO is building a machine learning system to
automatise the manual coding procedure. This artificial intelligence
undertakes the reading and interpretation steps from the coder and
automatically associates the business to a classification code. In addition, it
proposes new keywords and concepts. It is actually learning from an existing
dataset that has already been tested manually. In a sequent step, the horizon
of the system could be enlarged by looking directly on the web for additional
sources of information on the business to be coded.
Keywords
Machine Learning; Coding; Classifications; Key Words; Artificial Intelligence
1 | I S I W S C 2 0 1 9