Page 303 - Contributed Paper Session (CPS) - Volume 4
P. 303

CPS2233 Sharon Lee
            is the more challenging task of matching cell populations across the different
            data in the batch. An example is shown in Figure 1, where large variations in
            the size, shape, and location of the clusters of points can be observed across
            the data. This presents significant difficulties to automated methods as there
            can large variations between the data. Intuitive approaches such as fitting each
            data separately or pooling all data into an aggregate dataset fails to account
            for the inter-data relationship. Ad hoc approaches such as normalizing the
            data in a pre-processing step (Hahne et al, 2010) or matching the clusters in a
            post-hoc manner (Pyne et al., 2009) does not utilize all available and useful
            information. In particular, there is no information sharing between the data
            during the clustering step.
                 This paper presents Hcyto (Hierarchical model for cytometry data), a direct
            and automated approach for analysing batch cytometry data that inherently
            takes into the account variations between and within the data. We adopt a
            hierarchical  approach  to  handle  inter-data  variations,  where  each  data  is
            conceptualized  as  an  instance  of  a  template  mixture  model.  Under  this
            framework, each data is modelling by an individual mixture model that is an
            affine transformation of the template model. An appealing advantage of this
            approach  is  that  components  of  the  individual  mixture  models  are
            automatically aligned across the data. Another advantage is computational
            efficiency as clustering and aligning are performed at the same time without
            the need of additional pre-processing or postclustering steps. Furthermore, by
            adopting skew component densities, our approach can directly accommodate
            the  non-normal  features  of  the  data.  This  avoid  the  need  to  search  for  a
            suitable  transformation  for  each  data  or  to  determine  how  to  merge
            components.  To  illustrate  our  approach,  we  apply  Hcyto  to  real cytometry
            datasets, demonstrating favourable performance against other methods.

            2.  Methodology
                The Hcyto model consists of two levels: the upper level for between-data
            variations and the lower level for within-data variations. The former is a mixed
            effects model whereas the latter is a finite mixture of skew distributions. The
            upper level model intrinsically links the lower level models to a batch template
            model  -  another  finite  mixture  of  skew  distributions  —  that  describes  the
            overall characteristics of the batch. To facilitate discussion, we now introduce
            some  notations.  Let      be  a  p-dimensional  vector  consisting  the
            measurements of p markers on the   cell of data , where  = 1, … ,   and
                                                ℎ
                                                                                  
             = 1, … , . Here   denotes the total number of cells in data , and  is
                                
            the total number of data in the batch.




                                                               292 | I S I   W S C   2 0 1 9
   298   299   300   301   302   303   304   305   306   307   308