Page 227 - Special Topic Session (STS) - Volume 4
P. 227
STS582 Júlia M. P. S.
The main expected result in datasets integration is the representation of
the observations under a reduced dimension, which is committed to
optimizing any objective function that establishes relations among the
datasets. Such relations can be based on covariance matrices or prediction
functions, according to unsupervised or supervised proposals, respectively.
Here we focused mainly in methods derived from matrix factorization and
regression models. Most of the analyses available consider independent
observations, but several multi-omics studies are based on family data that
impose familial dependences among observations.
For multi-omics integration in family data we are considering strategies
that decompose the problem to polygenic components integration and
environmental components integration. It is a direct extension of the need to
include random effect when analysing data with dependencies. Each data
block is decomposed into two covariance matrices modelling different types
of variation, one due the polygenic random effect, that is sharing among
members from the same family and represents among-family variation, and
another due the error random effect (environmental), that is the within-family
variation. Then, it is performed low-rank approximation of the polygenic
variation across the blocks, and low-rank approximations of the environmental
variation components. The rational of our approach have been used in other
contexts. Feng et al. (2018), addressing the matrices decomposition problem
in datasets integration, proposed the angle-based joint and individual
variation explained method that allow to compute block scores, block
loadings, global loadings and global scores. We are working on the
computational implementation of our methods by using the R package
facilities.
References
1. Chen, C et al. (2011) Removing Batch Effects in Analysis of Expression
Microarray Data: An Evaluation of Six Batch Adjustment Methods. PloS
One 6(2): e17238.
2. Clough, T et al. (2012). Statistical protein quantification and significance
analysis in label-free LC-MS experiments with complex designs. BMC
Bioinformatics 13(Suppl 16): S6.
3. de Andrade, M et al. (2015). Global Individual Ancestry Using PCs for
Family Data. Human Heredity 80: 1-11.
4. Egan, KJ et al. (2016). Cohort profile: the Baependi Heart Study—a family-
based, highly admixed cohort study in a rural Brazilian town. BMJ Open 6:
1:8.
5. Feng et al. (2018). Angle-based joint and individual variation explained.
Journal of Multivariate Analysis 66: 241-265.
6. Hastie, T.; Tibshirani, R. (1993). Varying-coefficient models. Journal of the
Royal Statistical Society, Series B (Methodological): 757-796.
216 | I S I W S C 2 0 1 9