Page 303 - Contributed Paper Session (CPS) - Volume 4
P. 303
CPS2233 Sharon Lee
is the more challenging task of matching cell populations across the different
data in the batch. An example is shown in Figure 1, where large variations in
the size, shape, and location of the clusters of points can be observed across
the data. This presents significant difficulties to automated methods as there
can large variations between the data. Intuitive approaches such as fitting each
data separately or pooling all data into an aggregate dataset fails to account
for the inter-data relationship. Ad hoc approaches such as normalizing the
data in a pre-processing step (Hahne et al, 2010) or matching the clusters in a
post-hoc manner (Pyne et al., 2009) does not utilize all available and useful
information. In particular, there is no information sharing between the data
during the clustering step.
This paper presents Hcyto (Hierarchical model for cytometry data), a direct
and automated approach for analysing batch cytometry data that inherently
takes into the account variations between and within the data. We adopt a
hierarchical approach to handle inter-data variations, where each data is
conceptualized as an instance of a template mixture model. Under this
framework, each data is modelling by an individual mixture model that is an
affine transformation of the template model. An appealing advantage of this
approach is that components of the individual mixture models are
automatically aligned across the data. Another advantage is computational
efficiency as clustering and aligning are performed at the same time without
the need of additional pre-processing or postclustering steps. Furthermore, by
adopting skew component densities, our approach can directly accommodate
the non-normal features of the data. This avoid the need to search for a
suitable transformation for each data or to determine how to merge
components. To illustrate our approach, we apply Hcyto to real cytometry
datasets, demonstrating favourable performance against other methods.
2. Methodology
The Hcyto model consists of two levels: the upper level for between-data
variations and the lower level for within-data variations. The former is a mixed
effects model whereas the latter is a finite mixture of skew distributions. The
upper level model intrinsically links the lower level models to a batch template
model - another finite mixture of skew distributions — that describes the
overall characteristics of the batch. To facilitate discussion, we now introduce
some notations. Let be a p-dimensional vector consisting the
measurements of p markers on the cell of data , where = 1, … , and
ℎ
= 1, … , . Here denotes the total number of cells in data , and is
the total number of data in the batch.
292 | I S I W S C 2 0 1 9