Page 471 - Invited Paper Session (IPS) - Volume 1
P. 471

IPS177 F. Ricciato et al.
            computation program to the IP does not imply loss of control by the OP over
            what program is executed.
                The input privacy problem is more challenging when the required output
            results  are  based  on  the  contribution  of  several  input  data  sets  held  by
            multiple IPs that cannot disclose their data. In some cases, the computation
            program can be factorized into separate computation instances that are run
            independently by the multiple IPs, either sequentially or in parallel. However,
            very  often  the  desired  output  results  do  not  allow  for  computation
            factorization.  For  example,  this  is  the  case  when  output  results  must  be
            computed on the intersection records between different IP data sets, or when
            a regression must be run over variables that are held by different IPs. In these
            cases, we may resort to Secure Multi-Party Computation (SMC).

            3.  Solution approaches to Output Privacy problem
                The  output  privacy  approach  is  traditionally  addressed  by  so-called
            Statistical Disclosure Control (SDC) techniques, possibly in combination with
            Access  Control  (AC).  SDC  aims  at  restricting  what  is  disclosed,  while  AC
            imposes restrictions on to whom it is disclosed. Generally speaking, there is a
            trade-off between the two: the weakest SDC requires strongest AC, and vice-
            versa.
                AC methods rely on combination of requirements referring to the nature
            of  the  potential  users,  their  experience  with  holding  confidential  data  and
            legitimate use of the data. Potential users must provide evidence of fulfilling
            AC  requirements  which  is  scrutinized  by  the  data  owners.  The  trustworthy
            users are confined with more detailed data and better access facilities. SDC
            methods  rely  on  combination  of  suppression,  perturbation,  randomization
            and aggregation of data.
                Historically, SDC was performed manually by dedicated experts, following
            practices and criteria that were developed through the years in the official
            statistics  community.  SO  successfully  managed  the  output  control  as  the
            statistics  going  out  were  pre-defined  and  SO  could  consistently  apply
            suppressions on primary and secondary confidential cells. The current trend is
            towards “on-the-fly SDC”. Nowadays many users want to calculate their own
            tailor-made statistics on the basis of the detailed data sources. In response to
            these  needs  SO  make  available  dynamic  data  querying  systems  that
            implement modern SDC approaches, addressing in particular the problem of
            differential disclosure. These new SDC approaches require that the output is
            always safe, also in combination with any other statistics based on the same
            source.  The  random  noise  protection  method  developed  by  the  Australian
            Bureau of Statistics (ABS) is an example of modern SDC approach [11, 12]. The
            ABS method consists in applying small perturbations (controlled noise) to the
            data with the predefined probability distribution. A specific pseudo-random

                                                               460 | I S I   W S C   2 0 1 9
   466   467   468   469   470   471   472   473   474   475   476