Page 36 - Invited Paper Session (IPS) - Volume 1
P. 36
IPS35 Dinov I.D. et al.
structure as well as the bulk of the total data energy of the original data in
terms of conserving the overall distribution of the original data features.
Simultaneously, the method obfuscates the individual cases sufficiently to
protect against the risks of subject re-identification. The DataSifter technique
includes several user-controlled parameters that allow the data governor the
flexibility to control the level of obfuscation, trading privacy protection and
preservation of signal energy (Marino, Zhou et al. 2018). Figure 3 shows a
schematic of the DataSifting protocol.
Figure 3: Summary of the DataSifter protocol.
Figure 4 illustrate the validation results of applying the DataSifter to a specific
clinical case-study. In this case we obfuscated a large Autism Brain Imaging
Data Exchange (ABIDE) dataset including 1,098 volunteers and 2,400 features
(http://fcon 1000.projects.nitrc.org/indi/abide) (Di Martino, Yan et al. 2014,
Torgerson, Quinn et al. 2015). The results include the Percent of Identical
Feature Values (PIFV), vertical axis, for different DataSifter obfuscation levels.
Each box represents all subjects in the ABIDE sub-cohort and random forest
prediction of a specific binary clinical outcome - autism spectrum disorder –
(ASD) status (ASD vs. control).
25 | I S I W S C 2 0 1 9