Page 36 - Invited Paper Session (IPS) - Volume 1
P. 36

IPS35 Dinov I.D. et al.
                  structure as well as the bulk of the total data energy of the original data in
                  terms  of  conserving  the  overall  distribution  of  the  original  data  features.
                  Simultaneously,  the  method  obfuscates  the  individual  cases  sufficiently  to
                  protect against the risks of subject re-identification. The DataSifter technique
                  includes several user-controlled parameters that allow the data governor the
                  flexibility to control the level of obfuscation, trading privacy protection and
                  preservation of signal energy (Marino, Zhou et al. 2018). Figure 3 shows a
                  schematic of the DataSifting protocol.
























                                 Figure 3: Summary of the DataSifter protocol.

                  Figure 4 illustrate the validation results of applying the DataSifter to a specific
                  clinical case-study. In this case we obfuscated a large Autism Brain Imaging
                  Data Exchange (ABIDE) dataset including 1,098 volunteers and 2,400 features
                  (http://fcon  1000.projects.nitrc.org/indi/abide)  (Di  Martino,  Yan  et  al.  2014,
                  Torgerson,  Quinn  et  al.  2015).  The  results  include  the  Percent  of  Identical
                  Feature Values (PIFV), vertical axis, for different DataSifter obfuscation levels.
                  Each box represents all subjects in the ABIDE sub-cohort and random forest
                  prediction of a specific binary clinical outcome - autism spectrum disorder –
                  (ASD) status (ASD vs. control).

















                                                                     25 | I S I   W S C   2 0 1 9
   31   32   33   34   35   36   37   38   39   40   41