Page 147 - Invited Paper Session (IPS) - Volume 1
P. 147

IPS122 Elise C. et al.
            3.  Future plans
                The  short-term  plans  of  the  SSP  Lab  are  first  to  conduct  the  new
            experimental projects, increase its visibility within the official statistical service
            and increase the dissemination of its outputs on appropriate medias (blog,
            intranet, experimental page on the Internet). A second important issue is to
            develop appropriate contractual frameworks to host in the SSP Lab external
            researchers, postdoctoral fellows and PhD students.

            4.  Some examples of experimental projects
                The section details three examples of ongoing experiments. The entire list
            of activities planned 6 for 2018 is available in the appendix.

            4.1.  Employer identification in census survey
            Sponsor: INSEE Social Studies Directorate (Census unit);
            Team: SSP Lab (3 members), Census unit (2 members), IT (4 members), other
            units (2 members); Schedule: January 2018 (hackathon) and then from June to
            December     2018    (experimentation);   Expected   deliverables:   training
            (hackathon), an experimental prototype and an experimentation report.

                a. What opportunities do Big Data techniques offer?
                Currently, respondents of the Census report the name of their employer,
            the activity of the legal unit and the address of their workplace. These response
            boxes  are  filled  out  in  a  non-standardised  way,  and  frequently  result  into
            incorrect  answers  (spelling  mistakes,  imprecision  and  confusion  between
            fields). In order to obtain a relevant industry code for each job, an automatic
            coding of employers is currently processed, but it is successful for only 45% of
            respondents. The remaining 65% are manually coded, requiring the work of
            around 70 INSEE agents for five months each year. Big Data techniques seem
            to offer great opportunities to improve this process.

                b. Organisation of a Hackathon
                The SSP Lab in collaboration with the IT department organised the first
            Hackathon of INSEE on this subject on 18 and 19 January 2018. It gathered
            more than 60 persons from the whole SSP (INSEE and Ministerial Statistical
            Services)  and  its  partners  (the  Health  Insurance  InstituteCnam,  the  Central
            Bank, the Employment Agency-Pôle Emploi, etc.). Two days of training were
            organised  before  the  Hackathon  to  present  the  subject  and  the  approach.
            Different speakers presented the data involved (the Census and the business
            register, called SIRENE) and some techniques that could be useful during the
            Hackathon  (web  scraping,  text  mining,  geocoding,  etc.).  This  preparation
            phase was well received by the participants and the organising team received
            positive feedback.

                                                              136 | I S I   W S C   2 0 1 9
   142   143   144   145   146   147   148   149   150   151   152