Page 467 - Invited Paper Session (IPS) - Volume 1
P. 467

IPS177 F. Ricciato et al.
            scenarios  for  SMPC  and  SDC  integration  in  the  future  “confidentiality
            engineering” setup of modern official statistics.

            Keywords
            Privacy;  Confidentiality;  Security;  Statistical  Disclosure  Control;  Secure
            Multiparty Computation

            1.  Introduction and motivations
                The modern society is undergoing a process of massive datafication [1].
            The  availability  of  new  digital  data  sources  represents  an  opportunity  for
            Statistical Offices (SO) to complement traditional statistics and/or deliver novel
            statistical products with improved timeliness and relevance, so as to meet the
            increasing  demands  by  users.  However,  such  opportunities  come  with
            important  challenges  in  almost  every  aspect  –  methodological,  business
            models,  data  governance,  regulatory,  organizational  and  others.  The  new
            scenario calls for an evolution of the modus operandi adopted by SO also with
            respect  to  privacy  and  data  confidentiality.  We  propose  here  a  discussion
            framework focused on the prospective combination of advanced (dynamic)
            Statistical  Disclosure  Control  (SDC)  methods  with  Secure  Multi-Party
            Computation (SMC) techniques.
                For  decades,  the  data  business  has  been  a  natural  monopoly centered
            around SO: no other entity had the technical and legal capability to collect and
            process large scale data across individuals and organizations. In the traditional
            operation  model,  illustrated  in  Fig.  1,  the  SO  ingests  internally  all  source
            (micro-)data  that  were  collected  either  directly  from  the  data  subjects,  via
            surveys and censuses, or indirectly through administrative registers. The input
            source data collected in the back-end are then processed centrally to deliver
            two  types  of  front-end  data  in  output:  (i)  official  statistics  for  the  general
            public; and (ii) more detailed data for further processing by expert users and
            researchers downstream the data flow.
                The legal mandate of SO includes two important obligations that can be
            summarized as ‘closed input and open output’. On the input side (back-end)
            SO must preserve the confidentiality of the (micro-)data in order to protect
            the privacy of data subjects. On the output side (front-end) SO are committed
            to publish openly the processed statistics (and in general any output data), so
            as to ensure that all potential users get the same information and do so at the
            same time. The motivations and implications of both obligations are intimately
            connected  to  the  democratic  role  of  official  statistics  in  modern  society.
            However, in terms of real world applications, there is an unavoidable conflict
            between these two goals, since by definition the output data carry non-zero
            information about the input data (otherwise they would be useless), i.e., they
            always  reveal  something  about  the  input.  On  the  front-end,  SO  must

                                                               456 | I S I   W S C   2 0 1 9
   462   463   464   465   466   467   468   469   470   471   472