Page 302 - Contributed Paper Session (CPS) - Volume 4
P. 302
CPS2233 Sharon Lee
it is also limited by the non-reproducibility of results, the non-scalability to
massive datasets, and the difficulty in detecting high-dimensional
relationships from low-dimensional projected spaces. Due to this, recent
efforts have turned to machine learning, computer science, and statistics to
provide computational tools to analyse these data. Some reviews and
comparisons of these methods can be found in Aghaeepour et al. (2013, 2016)
and Weber et al. (2016).
Many of these methods employed a mixture model-based approach,
whether explicitly or implicitly. This is because mixture model provides a
convenient framework to characterize the heterogenous populations within
the data. The task of cell segmentation then translates to the traditional
problem of model-based clustering. However, it is well-known that cytometry
data typically exhibit non-normal features including skewness and long-
Figure 1. Eight examples from the batch of 16 DLBCL data. Large variations in the shape,
size, and location of clusters be observed across the different data.
tailedness. Hence traditional mixture models find it challenging to handle the
non-normal cluster shapes. To mitigate this, some methods attempt to
normalize or transform the data (Lo et al, 2009) while some others considered
merging multiple components from an overfitted model to allow for
asymmetric clusters (Aghaeepour et al, 2011, Mosmann et al, 2014). While
these approaches may alleviate the problem, it is ideal to have a flexible model
that can directly handle non-normal clusters. We thus adopt skew mixture
models for this task in this paper.
When analysing batch cytometry data, that is, a cohort of cytometry data
with similar characteristics (for example, data from patients diagnosed with a
certain disease or from the same individual across different time points), there
291 | I S I W S C 2 0 1 9