Page 224 - Special Topic Session (STS) - Volume 4
P. 224

STS582 Júlia M. P. S.
                  2.  Methodology
                      A detailed review of multi-omics integration is presented by Huang et al.
                  (2017).  All  efforts  are  dedicated  to  fully  account  for  the  uncertainties  and
                  heterogeneities in the datasets. Figure 1 shows a schematic representation of
                  the datasets structure involved in Omic’s studies. Based on matrix factorization
                  approaches,  unsupervised  and  supervised  analysis  have  been  used.  In  R
                  package,  mixOmics  (Lê  Cao  et  al.,  2009;  Rohart  et  al.,  2017)  is  a  powerful
                  resource  for  integration  of  multi-omics  datasets.  In  this  case,  multivariate

                  projection-based  methods  are  proposed  to  summarise  datasets,  × ,  by
                  latent components or scores ( × ) and loadings ( × ), such that ≈ ′,
                  m ≤ min(n, p). To properly do data reduction, different optimization problems
                  are  formulated  to  attain  objective  functions.  For  unsupervised  uni-omics
                  analysis,  principal  components  or  its  improved  version  via  independent
                  components  are  used,  and  for  unsupervised  multi-omics,  generalized
                  canonical  correlation  can  be  a  useful  strategy.  Considering  supervised
                  contexts, discriminant analysis combined with partial least square have been
                  proposed. In all cases, regularised and sparse solutions are required.

















                          Figure 1. Schematic representation of datasets integration in Omics studies.

                      Regression  models  are  powerful  tools  for  supervised  multi-omics
                  integration.  Ni  et  al.  (2018)  proposed  an  interesting  varying  coefficients
                  regression model, which allow integration of multi-omics datasets driven for
                  prediction of target outcomes. The model is flexible to take in account subject-
                  specific  coefficient  estimation,  i.e,  on  the  patient  level.  Under  regression
                  formulation, regulatory axes given by proteomic ( ) and genomic ( ) data
                                                                    1
                                                                                      2
                  are  connected  to  build  clinically  relevant  prognostic  through   ≈
                                                                                          
                  ∑  1   2 ) ,  where  the  varying  coefficients   ( 2 ) define  gene-protein
                         (
                                                                  
                  interactions by adopting smooth functions of  2 .
                      All  of  those  methods  assume  independent  observations,  and  are  not
                  applied for family-based data, which are very common in genomic studies.
                  Family data are mainly analysed using mixed model approaches that allow
                  including  familial  dependences  among  observations.  For  based  family

                                                                     213 | I S I   W S C   2 0 1 9
   219   220   221   222   223   224   225   226   227   228   229