Page 32 - Invited Paper Session (IPS) - Volume 1
P. 32
IPS35 Dinov I.D. et al.
Predictive Analytics of Big Neuroscience Data
Ivo D. Dinov, Nina Zhou, Syed Husain, Alexandr Kalinin, Yi Zhao, and
Simeone Marino
Statistics Online Computational Resource (SOCR), Departments of Health Behaviour and
Biological Sciences (HBBS) and Computational Medicine and Bioinformatics (DCMB), Michigan
Institute for Data Science (MIDAS), University of Michigan, Ann Arbor, MI 8109, USA
Abstract
This work present some of the Big neuroscience data research and education
challenges and opportunities. Specifically, I identify the core characteristics of
complex neuroscience data, discuss strategies for data harmonization and
aggregation, and show case-studies using large data of normal and
pathological cohorts. Examples of the demonstrated techniques include
DataSifter, which enables secure sharing of sensitive data, compressive big
data analytics, which facilitates inference on multi-source heterogeneous
datasets, and model-free prediction providing forecasting of clinical features
or derived computed phenotypes. Simulated data as well as clinical data (e.g.,
UK Biobank (UKBB), Alzheimer’s Disease Neuroimaging Initiative (ADNI), and
amyotrophic lateral sclerosis (ALS) case-studies) are used for testing and
validation of the techniques. In support of open-science, result reproducibility,
and methodological improvements, all datasets, statistical methods,
computational algorithms, and software tools are freely available online.
Keywords
Big Data; Model-based analytics; Model-free inference; Neurodegenerative
disorders; Data science; Open-science
1. Introduction
This paper aims to present some of the contemporary Big neuroscience
data challenges, provide examples of solutions for specific problems, and
identify research, computational, and educational opportunities. We will begin
by defining data science and predictive analytics and examining the common
characteristics of Big datasets. Focusing on several driving biomedical and
health challenges, we will pinpoint some concrete barriers to data sharing. We
will briefly review two complementary strategies to enable data computing on
sensitive information, -differential privacy (Dwork 2009) and homomorphic
encryption (Gentry 2009). Then, we will describe a recently introduced
technique for statistical obfuscation of sensitive data (DataSifter) and
demonstrate its approach to balancing data security and data-utility (Marino,
Zhou et al. 2018). We will conclude by examining three biomedical and health
21 | I S I W S C 2 0 1 9