Page 15 - Contributed Paper Session (CPS) - Volume 6
P. 15
CPS1465 Claude Macchi et al.
motorcycles", 46 "Wholesale trade, except of motor vehicles and motorcycles"
and 47 "Retail trade, except of motor vehicles and motorcycles". We can thus
see how the key words "workshop", "car body" and "automobile", which
should only identify position 45, have also been linked with companies
codified in positions 46 and 47 of the classification.
Figure 2: Cross-reference keywords NOGA 45, 46 and 47
These crosses also make it possible to discover elements that have not
been completely cleaned during the NLP operations (words in other
languages, isolated letters, etc.). This is the trigger for a new language
detection and stemming operation phase to correct errors in keywords. And
this loop will be repeated systematically until the keywords are of a quality
considered good enough to launch the next phase of the process.
This continuous feedback to the previous phase of the process is a key
element of NOGAuto's philosophy, which is based on a correction and
continuous improvement of the outcomes of the previous steps.
The « Modelling » phase
The companies to be codified are linked in a single matrix to the (1)
keywords resulting from the description of the economic activity produced in
the “Preparation” phase as well as to (2) variables (address, jobs, turnover, legal
form, etc.) from the SBER and which are intended to provide additional input
for the definition of the NOGA code to be assigned. From this matrix a
prediction model will be generated that defines the keywords and concepts to
4 | I S I W S C 2 0 1 9