Page 32 - Invited Paper Session (IPS) - Volume 1
P. 32

IPS35 Dinov I.D. et al.





                                Predictive Analytics of Big Neuroscience Data
                      Ivo D. Dinov, Nina Zhou, Syed Husain, Alexandr Kalinin, Yi Zhao, and
                                               Simeone Marino
                     Statistics Online Computational Resource (SOCR), Departments of Health Behaviour and
                  Biological Sciences (HBBS) and Computational Medicine and Bioinformatics (DCMB), Michigan
                      Institute for Data Science (MIDAS), University of Michigan, Ann Arbor, MI 8109, USA

                  Abstract
                  This work present some of the Big neuroscience data research and education
                  challenges and opportunities. Specifically, I identify the core characteristics of
                  complex  neuroscience  data,  discuss  strategies  for  data  harmonization  and
                  aggregation,  and  show  case-studies  using  large  data  of  normal  and
                  pathological  cohorts.  Examples  of  the  demonstrated  techniques  include
                  DataSifter, which enables secure sharing of sensitive data, compressive big
                  data  analytics,  which  facilitates  inference  on  multi-source  heterogeneous
                  datasets, and model-free prediction providing forecasting of clinical features
                  or derived computed phenotypes. Simulated data as well as clinical data (e.g.,
                  UK Biobank (UKBB), Alzheimer’s Disease Neuroimaging Initiative (ADNI), and
                  amyotrophic  lateral  sclerosis  (ALS)  case-studies)  are  used  for  testing  and
                  validation of the techniques. In support of open-science, result reproducibility,
                  and  methodological  improvements,  all  datasets,  statistical  methods,
                  computational algorithms, and software tools are freely available online.

                  Keywords
                  Big  Data;  Model-based  analytics;  Model-free  inference;  Neurodegenerative
                  disorders; Data science; Open-science

                  1. Introduction
                      This paper aims to present some of the contemporary Big neuroscience
                  data  challenges,  provide  examples  of  solutions  for  specific  problems,  and
                  identify research, computational, and educational opportunities. We will begin
                  by defining data science and predictive analytics and examining the common
                  characteristics  of  Big  datasets.  Focusing  on  several  driving  biomedical  and
                  health challenges, we will pinpoint some concrete barriers to data sharing. We
                  will briefly review two complementary strategies to enable data computing on
                  sensitive  information,  -differential  privacy  (Dwork  2009)  and  homomorphic
                  encryption  (Gentry  2009).  Then,  we  will  describe  a  recently  introduced
                  technique  for  statistical  obfuscation  of  sensitive  data  (DataSifter)  and
                  demonstrate its approach to balancing data security and data-utility (Marino,
                  Zhou et al. 2018). We will conclude by examining three biomedical and health

                                                                     21 | I S I   W S C   2 0 1 9
   27   28   29   30   31   32   33   34   35   36   37