Page 227 - Special Topic Session (STS) - Volume 4
P. 227

STS582 Júlia M. P. S.
                The main expected result in datasets integration is the representation of
            the  observations  under  a  reduced  dimension,  which  is  committed  to
            optimizing  any  objective  function  that  establishes  relations  among  the
            datasets. Such relations can be based on covariance matrices or prediction
            functions,  according  to unsupervised  or  supervised  proposals,  respectively.
            Here we focused mainly in methods derived from matrix factorization and
            regression  models.  Most  of  the  analyses  available  consider  independent
            observations, but several multi-omics studies are based on family data that
            impose familial dependences among observations.
                For multi-omics integration in family data we are considering strategies
            that  decompose  the  problem  to  polygenic  components  integration  and
            environmental components integration. It is a direct extension of the need to
            include  random  effect  when  analysing  data  with  dependencies.  Each  data
            block is decomposed into two covariance matrices modelling different types
            of  variation,  one  due  the  polygenic  random  effect,  that  is  sharing  among
            members from the same family and represents among-family variation, and
            another due the error random effect (environmental), that is the within-family
            variation.  Then,  it  is  performed  low-rank  approximation  of  the  polygenic
            variation across the blocks, and low-rank approximations of the environmental
            variation components. The rational of our approach have been used in other
            contexts. Feng et al. (2018), addressing the matrices decomposition problem
            in  datasets  integration,  proposed  the  angle-based  joint  and  individual
            variation  explained  method  that  allow  to  compute  block  scores,  block
            loadings,  global  loadings  and  global  scores.  We  are  working  on  the
            computational  implementation  of  our  methods  by  using  the  R  package
            facilities.

            References
            1.  Chen, C et al. (2011) Removing Batch Effects in Analysis of Expression
                Microarray Data: An Evaluation of Six Batch Adjustment Methods. PloS
                One 6(2): e17238.
            2.  Clough, T et al. (2012). Statistical protein quantification and significance
                analysis in label-free LC-MS experiments with complex designs. BMC
                Bioinformatics 13(Suppl 16): S6.
            3.  de Andrade, M et al. (2015). Global Individual Ancestry Using PCs for
                Family Data. Human Heredity 80: 1-11.
            4.  Egan, KJ et al. (2016). Cohort profile: the Baependi Heart Study—a family-
                based, highly admixed cohort study in a rural Brazilian town. BMJ Open 6:
                1:8.
            5.  Feng et al. (2018). Angle-based joint and individual variation explained.
                Journal of Multivariate Analysis 66: 241-265.
            6.  Hastie, T.; Tibshirani, R. (1993). Varying-coefficient models. Journal of the
                Royal Statistical Society, Series B (Methodological): 757-796.

                                                               216 | I S I   W S C   2 0 1 9
   222   223   224   225   226   227   228   229   230   231   232