Page 223 - Special Topic Session (STS) - Volume 4
P. 223

STS582 Júlia M. P. S.
            biologically relevant molecular signatures and their analysis can suggest novel
            biological hypotheses.
                Another class of N-integration techniques is based on a flexible regression
            framework. Under an unsupervised approach, probabilistic graphical models
            (PGMs)  can  be  used  for  learning  relations  among  multiple  variables
            (Meinshausen et al., 2006). Tenenhaus et al. (2014) proposed a generalized
            canonical correlation analysis for N-integration with loads in the optimization
            problem defined in terms of the connections in a PGM. In addition, supervised
            N-integration can be performed by incorporating varying coefficients (Hastie
            and  Tibshirani,  1993)  into  the  regression  model,  with  the  multi-omics
            integration oriented for prediction of clinical outcomes. In this context, Ni et
            al. (2018) proposed a Bayesian hierarchical varying-sparsity regression model
            and apply for genomic and proteomic data integration to be prognostic for
            the patient’s survival time.
                Further, the P-integration of independent data sets measured on the same
            common set of variables (omics data) can be a useful opportunity to increase
            sample size and gain statistical power. The main challenge in this case is to
            prevent the analysis from systematic heterogeneities arising from the different
            sources of variation, as those coming from different protocols. For instance,
            batch and multi-center effects are unwanted variation, which often acts as
            strong confounders in the P-integration analysis. Such effects may lead to
            spurious conclusions if they are not accounted for in the statistical model.
                Despite the recent progress made in the area of multi-omics integration,
            the methods assume independent observations (unrelated individuals), and if
            family structure is present and ignored in the analysis, such substructures may
            induce artefactual results for data integration. For instance, in the context of
            uni-omics data, specifically considering large pedigrees and high dimensional
            SNP-genotype  data,  de  Andrade  et  al.  (2015)  obtained  valid  principal
            components  estimators  and  showed  that  the  latent  variables  taking  into
            account the family structure are more informative than those ignoring such
            substructure. Ribeiro and Soler (2018), who proposed a probabilistic graphical
            model for learning relationships among multiple variables from family data,
            also consider the impact of clustered observation at the analysis. The outline
            of this work is as follows. First, we will review and discuss unsupervised and
            supervised multi-omics data integration methods, under the assumption of
            unrelated  samples.  Subsequently,  we  will  consider  family  based  designs,
            incorporate  dependence  among  related  individuals  and  exploit  how  the
            covariance  matrix  among  variables  is  decomposed  into  genetic  and
            environmental components. Finally, we will discuss the advancement of data
            integration methods to take into account family structure present on the data.




                                                               212 | I S I   W S C   2 0 1 9
   218   219   220   221   222   223   224   225   226   227   228