Page 224 - Special Topic Session (STS) - Volume 4
P. 224
STS582 Júlia M. P. S.
2. Methodology
A detailed review of multi-omics integration is presented by Huang et al.
(2017). All efforts are dedicated to fully account for the uncertainties and
heterogeneities in the datasets. Figure 1 shows a schematic representation of
the datasets structure involved in Omic’s studies. Based on matrix factorization
approaches, unsupervised and supervised analysis have been used. In R
package, mixOmics (Lê Cao et al., 2009; Rohart et al., 2017) is a powerful
resource for integration of multi-omics datasets. In this case, multivariate
projection-based methods are proposed to summarise datasets, × , by
latent components or scores ( × ) and loadings ( × ), such that ≈ ′,
m ≤ min(n, p). To properly do data reduction, different optimization problems
are formulated to attain objective functions. For unsupervised uni-omics
analysis, principal components or its improved version via independent
components are used, and for unsupervised multi-omics, generalized
canonical correlation can be a useful strategy. Considering supervised
contexts, discriminant analysis combined with partial least square have been
proposed. In all cases, regularised and sparse solutions are required.
Figure 1. Schematic representation of datasets integration in Omics studies.
Regression models are powerful tools for supervised multi-omics
integration. Ni et al. (2018) proposed an interesting varying coefficients
regression model, which allow integration of multi-omics datasets driven for
prediction of target outcomes. The model is flexible to take in account subject-
specific coefficient estimation, i.e, on the patient level. Under regression
formulation, regulatory axes given by proteomic ( ) and genomic ( ) data
1
2
are connected to build clinically relevant prognostic through ≈
∑ 1 2 ) , where the varying coefficients ( 2 ) define gene-protein
(
interactions by adopting smooth functions of 2 .
All of those methods assume independent observations, and are not
applied for family-based data, which are very common in genomic studies.
Family data are mainly analysed using mixed model approaches that allow
including familial dependences among observations. For based family
213 | I S I W S C 2 0 1 9