Page 148 - Invited Paper Session (IPS) - Volume 1
P. 148
IPS122 Elise C. et al.
c. Transforming this event into an experiment
Since the Hackathon, a small team (including Census department
members, SSP Lab members, IT members and some participants in the
Hackathon) has been working to the development of a prototype,
implementing some ideas that emerged during the Hackathon and adding
some new functionalities. This is still a work in progress.
4.2. Detecting wages/paid hours anomalies in employer payroll declaration
statistical databases
Sponsor: INSEE Social Studies Directorate (Employment and Professional
Income unit);
Team: SSP Lab (1 member), Statistical Methods Unit (1 member), Employment
and professional income unit (2 members);
Schedule: from January to December 2018;
Expected deliverables: experimentation report, guidelines for
implementation, and methodological and academic contributions.
a. A major change in the employer payroll and social contribution
declaration format offering new opportunities
The Annual Declaration of Social Data (“déclaration annuelle de données
sociales”, DADS), mandatory fulfilled each year by each employer and to which
reported individual wage-earner information is transmitted to fiscal and social
services for payroll and tax purposes as well as for calculating social security
wage-earners rights (e.g., pensions), has been replaced since 2016 by a
monthly Nominative Social Declaration information. This change of sources
completely modifies the national statistical service of information on
employment and wages that relies on, but also provides the opportunity to
rethink the automatic anomaly detection process implemented in the
statistical production line, as the latter is deeply modified to integrate these
new data. An adapted automatic detection of such anomalies would lead to
productivity gains in the subsequent editing procedure.
b. Machine learning contributions to anomaly detection of wages/paid
hours data
The experimental project carried out with the department of Employment
and Professional income of the Social Studies Directorate tests different
machine learning-based algorithms for anomaly detection of net and gross
wages and related paid hours. The project has so far investigated unsupervised
algorithms, such as fuzzy association rules, isolation forests and local 8 outlier
factors on a small scale, with the intention to provide a probability score of an
outlying position. The next steps will be to constitute a sample of observations
labelled as anomalies or not to evaluate the performance of the methods
137 | I S I W S C 2 0 1 9