Page 466 - Invited Paper Session (IPS) - Volume 1
P. 466

IPS177 F. Ricciato et al.



                              A reflection on privacy and data confidentiality in
                                               Official Statistics
                      F. Ricciato, A. Bujnowska, A. Wirthmann, M. Hahn, E. Barredo-Capelot
                        EUROSTAT – Directorate B: Methodology; Dissemination; Cooperation in the ESS

                  Abstract
                  The  availability  of  new  digital  data  sources  represents  an  opportunity  for
                  Statistical Offices (SO) to complement traditional statistics and/or deliver novel
                  statistics with improved timeliness and relevance. Nowadays SOs are part of a
                  larger  “data  ecosystem”  where  different  organizations,  including  public
                  institutions and private companies, engage in the collection and processing of
                  different  kinds  of  (new)  data  about  citizens,  companies,  goods  etc.  In  this
                  multi-actors scenario it is often desirable to let one organization extract some
                  output statistics (i.e., aggregate information) from input data that are held by
                  other organization(s) in different administrative domain(s). We refer to this
                  problem as cross-domain statistical processing. To achieve this goal, the most
                  intuitive  approach—but  not  the  only  one—is  to  exchange  raw  input  data
                  across administrative domains (organizations). However, this strategy is not
                  always viable when personal input data are involved, due to a combination of
                  regulatory constraints (including lack of explicit legal basis for data sharing),
                  business confidentiality, privacy requirements, or a combination of the above.
                  Furthermore,  new  data  sources  often  embed  a  much  more  pervasive  view
                  about individuals than traditional survey and/or administrative data, an aspect
                  that  amplifies  the  potential  risks  of  data  concentration.  In  such  cases,
                  performing cross-domain statistical processing requires technologies to elicit
                  only the agreed-upon output information (exactly or approximately) without
                  revealing  the  input  data.  This  entails  addressing  two  distinct  but
                  complementary  problems.  First,  we  need  to  compute  the  desired  output
                  statistics without seeing the raw input data. Second, we need to control the
                  amount of information that might be inferred about individual data subjects
                  in the input dataset from the output. In the field of privacy engineering the
                  notions of “input privacy” and “output privacy” are used to refer respectively
                  to these two problems. We remark that these problems are separable, i.e., they
                  can be addressed with distinct tools and methods that get combined together,
                  overlaid or juxtaposed. In this contribution we review recent advances in both
                  fields and briefly discuss their complementary roles. As for input privacy, we
                  provide a brief introduction to the fundamental principles of Secure Multi-
                  Party Computation (SMPC). As for output privacy, we review recent advances
                  in the field of Statistical Disclosure Control (SDC). Finally, we discuss possible




                                                                     455 | I S I   W S C   2 0 1 9
   461   462   463   464   465   466   467   468   469   470   471