Page 302 - Contributed Paper Session (CPS) - Volume 4
P. 302

CPS2233 Sharon Lee
                  it is also limited by the non-reproducibility of results, the non-scalability to
                  massive  datasets,  and  the  difficulty  in  detecting  high-dimensional
                  relationships  from  low-dimensional  projected  spaces.  Due  to  this,  recent
                  efforts have turned to machine learning, computer science, and statistics to
                  provide  computational  tools  to  analyse  these  data.  Some  reviews  and
                  comparisons of these methods can be found in Aghaeepour et al. (2013, 2016)
                  and Weber et al. (2016).
                       Many  of  these  methods  employed  a  mixture  model-based  approach,
                  whether  explicitly  or  implicitly.  This  is  because  mixture  model  provides  a
                  convenient framework to characterize the heterogenous populations within
                  the  data.  The  task  of  cell  segmentation  then  translates  to  the  traditional
                  problem of model-based clustering. However, it is well-known that cytometry
                  data  typically  exhibit  non-normal   features  including  skewness  and  long-



























                      Figure 1. Eight examples from the batch of 16 DLBCL data. Large variations in the shape,
                             size, and location of clusters be observed across the different data.

                  tailedness. Hence traditional mixture models find it challenging to handle the
                  non-normal  cluster  shapes.  To  mitigate  this,  some  methods  attempt  to
                  normalize or transform the data (Lo et al, 2009) while some others considered
                  merging  multiple  components  from  an  overfitted  model  to  allow  for
                  asymmetric clusters (Aghaeepour et al, 2011, Mosmann et al, 2014).  While
                  these approaches may alleviate the problem, it is ideal to have a flexible model
                  that can  directly handle non-normal clusters. We thus adopt skew mixture
                  models for this task in this paper.
                       When analysing batch cytometry data, that is, a cohort of cytometry data
                  with similar characteristics (for example, data from patients diagnosed with a
                  certain disease or from the same individual across different time points), there

                                                                     291 | I S I   W S C   2 0 1 9
   297   298   299   300   301   302   303   304   305   306   307