Page 330 - Invited Paper Session (IPS) - Volume 1
P. 330
IPS155 Laura B.
other hand, when this initial check is passed, BIRD runs the program and a
dedicated Banca d’Italia employee examines the output of the program in
order to further verify that confidentiality is not breached by the submitted
4
computations. Once the output does not violate any confidentiality
restriction, i.e. it does not identify information referred to any single or
restricted group of firms (or banks, in the future), the output can be released
to researcher. She hence receives an email with the cleared output. If, on the
opposite, the manual check envisages a confidentiality violation, the
researcher receives an email explaining the reason of the rejection. It could
also be the case that the program is miswritten, and BIRD ends in error. Again
the researcher receives an email reporting the misspelling of the program. In
order to reduce the number of these occurrences and speed up the process,
since 2016 a dataset with fake figures in semicolon-delimited ASCII format that
replicates the internal structure of the original data from the Survey of
Industrial and Service Firms (but contains randomly generated data) is
available on Bank of Italy’s web site, so that researchers can test the editing of
their codes before submitting them to BIRD.
In order to prevent any violation to confidentiality restrictions there are
several firewalls, at three different levels: user (users are identified, qualified
and registered; registered mailboxes are whitelisted; outputs are monitored
and archived; deontological code, privacy law, specific penalties); data
(identifying variables are expunged from the datasets used for remote
processing; extreme data are censored; stratification variables are collapsed);
processing (forbidden to display individual data; keyword parser implemented
with blacklist and greylist; particularly long and/or complex programmes are
always reviewed manually; all submissions are reviewed manually).
Each institution allowing for remote processing of granular data provides
a similar set of controls. Remote execution platforms are then considered
reasonably safe and useful and thus remain an important tool for the
dissemination of granular data for many data providers around the globe.
4. Households’ survey data
The Survey on Household Income and Wealth (SHIW) was begun in the
1960s to gather data on the incomes and savings of Italian households. From
the beginning to the publication of Banca d’Italia’s Internet web site, dataset
4 There’s a clear trade-off between the length of the commands that are automatically
forbidden, and the flexibility allowed to researchers in running computations and regressions.
The higher the flexibility granted to the researcher, the higher the role played by Banca d’Italia
employee in manually checking the output of the job submitted. As for the moment, the list of
forbidden keywords is very limited, but should users and jobs submitted increase significantly
and dedicated employees not balance this rising burden, the list of forbidden keywords could
be enlarged.
319 | I S I W S C 2 0 1 9