Page 147 - Invited Paper Session (IPS) - Volume 1
P. 147
IPS122 Elise C. et al.
3. Future plans
The short-term plans of the SSP Lab are first to conduct the new
experimental projects, increase its visibility within the official statistical service
and increase the dissemination of its outputs on appropriate medias (blog,
intranet, experimental page on the Internet). A second important issue is to
develop appropriate contractual frameworks to host in the SSP Lab external
researchers, postdoctoral fellows and PhD students.
4. Some examples of experimental projects
The section details three examples of ongoing experiments. The entire list
of activities planned 6 for 2018 is available in the appendix.
4.1. Employer identification in census survey
Sponsor: INSEE Social Studies Directorate (Census unit);
Team: SSP Lab (3 members), Census unit (2 members), IT (4 members), other
units (2 members); Schedule: January 2018 (hackathon) and then from June to
December 2018 (experimentation); Expected deliverables: training
(hackathon), an experimental prototype and an experimentation report.
a. What opportunities do Big Data techniques offer?
Currently, respondents of the Census report the name of their employer,
the activity of the legal unit and the address of their workplace. These response
boxes are filled out in a non-standardised way, and frequently result into
incorrect answers (spelling mistakes, imprecision and confusion between
fields). In order to obtain a relevant industry code for each job, an automatic
coding of employers is currently processed, but it is successful for only 45% of
respondents. The remaining 65% are manually coded, requiring the work of
around 70 INSEE agents for five months each year. Big Data techniques seem
to offer great opportunities to improve this process.
b. Organisation of a Hackathon
The SSP Lab in collaboration with the IT department organised the first
Hackathon of INSEE on this subject on 18 and 19 January 2018. It gathered
more than 60 persons from the whole SSP (INSEE and Ministerial Statistical
Services) and its partners (the Health Insurance InstituteCnam, the Central
Bank, the Employment Agency-Pôle Emploi, etc.). Two days of training were
organised before the Hackathon to present the subject and the approach.
Different speakers presented the data involved (the Census and the business
register, called SIRENE) and some techniques that could be useful during the
Hackathon (web scraping, text mining, geocoding, etc.). This preparation
phase was well received by the participants and the organising team received
positive feedback.
136 | I S I W S C 2 0 1 9