Page 14 - Contributed Paper Session (CPS) - Volume 6
P. 14
CPS1465 Claude Macchi et al.
input of new keywords proposed thanks to artificial intelligence, will allow the
system, with the construction of additional layers, to continuously improve the
quality of the proposed codes and to reach a stage where the codes can, in
most cases, be assigned automatically, without any human intervention.
2. Methodology
The process workflow
Figure 1 below shows the processes of the system, the purpose of which
is to support and facilitate the coding of business activity at the FSO.
Figure 1: The NOGAuto process workflow
The « Preparation » phase
Texts describing the economic activities of companies registered in the
SBER, those from electronic surveys and metadata from the FSO's central
metadata system are imported into the NOGAuto system and subject to
Natural Language Processing (NLP) operations. In this context, the texts are
first of all subjected to a language detection process, followed by a stemming
operation, which allows a normalisation of the elements of the texts (verbs,
disinences, etc.) as well as a cleaning, with the elimination of stop words which
cannot be directly related to an economic activity (articles, acronyms,
prepositions, etc.).
The output of the NLP operations is then analysed and the various
information cross-referenced. Particular attention was paid to the complex
classification positions that are more difficult to assign and where more coding
errors are committed.
Figure 2 shows, for example, an extract from the analysis of NOGA
positions 45 "Wholesale and retail trade and repair of motor vehicles and
3 | I S I W S C 2 0 1 9