Page 16 - Contributed Paper Session (CPS) - Volume 6
P. 16

CPS1465 Claude Macchi et al.
                  be linked with the different activity codes to be assigned to companies. This
                  model will be continuously enriched with elements from the descriptions of
                  the activities of the new companies to be codified or additional information
                  from the SBER. Once again here as well, feedback to the previous phase of the
                  process is essential, so that continuous correction and improvement of the
                  model can be achieved.
                      The last step in this phase is the evaluation of the codes that the system
                  proposes to the coders, who validate them, integrate them into the SBER or
                  reject them, which will generate feedback at earlier stages of the process, until
                  the defined success criteria are met.

                  3.  Result
                      The  NOGAuto  project  was  launched  in  early  2018  and  is  still  under
                  construction. After having built and tested the "Preparation" phase, we are
                  currently  building  the  "Modelling"  part.  Full  process  testing,  based  on  an
                  existing  dataset  that  already  has  been  tested  manually,  including  the
                  evaluation  of  the  codes  at  aggregate  NOGA  2-digit  level  that  the  system
                  proposes,  is  scheduled  for  spring  2019.  Its  productive  implementation  is
                  planned in stages. In a first period, and until the quality level defined for the
                  most  detailed  NOGA  code  level  is  reached  –  expected  by  mid-2020  –  the
                  system will only be used as a support tool for coders.
                      NOGAuto  is  not  only  a  tool  that  can  be  used  to  codify  the  economic
                  activities  of  businesses,  but  can  also  be  adapted  to  the  needs  of  other
                  classifications.  The  more  structured,  standardised  and  targeted  the
                  information  to  be  codified,  the  easier  it  is  to  propose  an  automatic
                  codification. Initial discussions for an adaptation and an implementation of the
                  system in the context of the classifications of occupations and of diseases and
                  health problems have already been launched.
                      A central point that has accompanied this machine learning project from
                  the  beginning  is  the  question  of  acceptance.  The  word  "automation"  has
                  quickly been linked to "work reduction" and "loss of job", which caused quite
                  a lot of opposition to the project, principally among the staff responsible for
                  coding.  Especially  at  the  beginning  of  the  work  the  cooperation  with  the
                  people  who  were  supposed  to  give  the  initial  inputs  on  the  codification
                  processes was quite complex. An exercise of communication, explanation and
                  clarification was necessary to gain the trust and collaboration of the staff. For
                  the ISI 2019 conference, it is planned to present the complete system as well
                  as the results of the tests performed with data at NOGA 2-digit level.

                  4.  Discussion and Conclusion
                      Thanks  to  machine  learning,  the  NOGAuto  system  will  take  over  the
                  reading  and  interpretation  tasks  of  coders,  propose  new  keywords  and

                                                                       5 | I S I   W S C   2 0 1 9
   11   12   13   14   15   16   17   18   19   20   21