Page 13 - Contributed Paper Session (CPS) - Volume 6
P. 13

CPS1465 Claude Macchi et al.
            1.  Introduction
                Each company and establishment stored in the SBER has a Swiss Economic
            Activity Classification (NOGA) code, which is based on the Classification of
            Economic Activities in the European Community (NACE). The codification is
            carried out in two main steps: the first, with the assignment of a provisional
            code when the business is integrated into the registers, and the second, which
            takes place a few months later, with the validation of the first code assigned.
                During  the  first  codification,  the  FSO  uses  the  economic  activity  code
            assigned by the source itself, which provides SBER with information on the
            new company. Depending on the data  source, this coding may have been
            done either by the source itself (mainly in the case of administrative data or
            registers  external  to  the  FSO),  or  by  third party  companies  (in  the case  of
            announcements from commercial registers). The first code is then validated by
            the FSO as part of a specific survey of all new companies registered in the
            SBER.  Based  on  the  descriptions  of  economic  activities  provided  by  the
            companies themselves, coders identify terms or concepts deemed relevant
            that are compared to a list of keywords, currently containing more than 11’000
            items and concepts in four different languages (German, French, Italian and
            English)  and related to each of the NOGA positions. This allows coders to
            select a code and assign it to the observed company. After this phase, the
            codes assigned may be updated or corrected at any time, on  the basis of
            inputs  from  surveys  carried  out  in  the  context  of  statistical  production,
            administrative  sources,  external  registers,  the  companies  themselves  and
            information  obtained  on  the  Internet.  The  codes  defining  the  economic
            activity of companies are all assigned based on oral or written information
            provided by the companies themselves. Codification therefore consists mainly
            of reading, understanding and interpreting a text, followed by the definition
            of terms or concepts that are compared with a list of keywords linked to the
            classification codes.
                With  the  NOGAuto  system,  the  FSO  aims  to  build  a  machine  learning
            system with the aim of automating the assignment of economic activity codes
            to SBER companies. This will make it possible to
                •  reduce  to  a  minimum  the  interpretation  made  by  coders  of  texts
                  describing the economic activities of companies,
                •  harmonise and standardise the assignment of codes and
                •  minimise the time spent on the coding activity.
            NOGAuto is not built in one go, but, like an onion, in different layers. The
            central nucleus of the onion, the first phase of construction, makes it possible
            to validate the codes currently associated with the units already registered in
            the SBER. The following layer will assist coders with code proposals for the
            activities  of  companies  to  be  codified.  The  interaction  of  coders,  who  will
            accept or reject the codes proposed by the system, as well as the continuous

                                                                 2 | I S I   W S C   2 0 1 9
   8   9   10   11   12   13   14   15   16   17   18