Page 466 - Invited Paper Session (IPS) - Volume 1
P. 466
IPS177 F. Ricciato et al.
A reflection on privacy and data confidentiality in
Official Statistics
F. Ricciato, A. Bujnowska, A. Wirthmann, M. Hahn, E. Barredo-Capelot
EUROSTAT – Directorate B: Methodology; Dissemination; Cooperation in the ESS
Abstract
The availability of new digital data sources represents an opportunity for
Statistical Offices (SO) to complement traditional statistics and/or deliver novel
statistics with improved timeliness and relevance. Nowadays SOs are part of a
larger “data ecosystem” where different organizations, including public
institutions and private companies, engage in the collection and processing of
different kinds of (new) data about citizens, companies, goods etc. In this
multi-actors scenario it is often desirable to let one organization extract some
output statistics (i.e., aggregate information) from input data that are held by
other organization(s) in different administrative domain(s). We refer to this
problem as cross-domain statistical processing. To achieve this goal, the most
intuitive approach—but not the only one—is to exchange raw input data
across administrative domains (organizations). However, this strategy is not
always viable when personal input data are involved, due to a combination of
regulatory constraints (including lack of explicit legal basis for data sharing),
business confidentiality, privacy requirements, or a combination of the above.
Furthermore, new data sources often embed a much more pervasive view
about individuals than traditional survey and/or administrative data, an aspect
that amplifies the potential risks of data concentration. In such cases,
performing cross-domain statistical processing requires technologies to elicit
only the agreed-upon output information (exactly or approximately) without
revealing the input data. This entails addressing two distinct but
complementary problems. First, we need to compute the desired output
statistics without seeing the raw input data. Second, we need to control the
amount of information that might be inferred about individual data subjects
in the input dataset from the output. In the field of privacy engineering the
notions of “input privacy” and “output privacy” are used to refer respectively
to these two problems. We remark that these problems are separable, i.e., they
can be addressed with distinct tools and methods that get combined together,
overlaid or juxtaposed. In this contribution we review recent advances in both
fields and briefly discuss their complementary roles. As for input privacy, we
provide a brief introduction to the fundamental principles of Secure Multi-
Party Computation (SMPC). As for output privacy, we review recent advances
in the field of Statistical Disclosure Control (SDC). Finally, we discuss possible
455 | I S I W S C 2 0 1 9