Page 148 - Invited Paper Session (IPS) - Volume 1
P. 148

IPS122 Elise C. et al.
                     c. Transforming this event into an experiment
                      Since  the  Hackathon,  a  small  team  (including  Census  department
                  members,  SSP  Lab  members,  IT  members  and  some  participants  in  the
                  Hackathon)  has  been  working  to  the  development  of  a  prototype,
                  implementing some ideas that emerged during the Hackathon and adding
                  some new functionalities. This is still a work in progress.

                 4.2.  Detecting  wages/paid  hours  anomalies  in  employer  payroll  declaration
                      statistical databases
                  Sponsor:  INSEE  Social  Studies  Directorate  (Employment  and  Professional
                  Income unit);
                  Team: SSP Lab (1 member), Statistical Methods Unit (1 member), Employment
                  and professional income unit (2 members);
                  Schedule: from January to December 2018;
                      Expected    deliverables:   experimentation   report,   guidelines   for
                  implementation, and methodological and academic contributions.
                      a.  A  major  change  in  the  employer  payroll  and  social  contribution
                         declaration format offering new opportunities
                      The Annual Declaration of Social Data (“déclaration annuelle de données
                  sociales”, DADS), mandatory fulfilled each year by each employer and to which
                  reported individual wage-earner information is transmitted to fiscal and social
                  services for payroll and tax purposes as well as for calculating social security
                  wage-earners  rights  (e.g.,  pensions),  has  been  replaced  since  2016  by  a
                  monthly Nominative Social Declaration information. This change of sources
                  completely  modifies  the  national  statistical  service  of  information  on
                  employment and wages that relies on, but also provides the opportunity to
                  rethink  the  automatic  anomaly  detection  process  implemented  in  the
                  statistical production line, as the latter is deeply modified to integrate these
                  new data. An adapted automatic detection of such anomalies would lead to
                  productivity gains in the subsequent editing procedure.

                      b.  Machine learning contributions to anomaly detection of wages/paid
                         hours data
                      The experimental project carried out with the department of Employment
                  and  Professional  income  of  the  Social  Studies  Directorate  tests  different
                  machine learning-based algorithms for anomaly detection of net and gross
                  wages and related paid hours. The project has so far investigated unsupervised
                  algorithms, such as fuzzy association rules, isolation forests and local 8 outlier
                  factors on a small scale, with the intention to provide a probability score of an
                  outlying position. The next steps will be to constitute a sample of observations
                  labelled  as  anomalies  or  not  to  evaluate  the  performance  of  the  methods



                                                                    137 | I S I   W S C   2 0 1 9
   143   144   145   146   147   148   149   150   151   152   153