Page 306 - Contributed Paper Session (CPS) - Volume 4
P. 306

CPS2233 Sharon Lee
                  3.  Result
                      For demonstration, we consider the diffuse large B-cell lymphoma (DLBCL)
                  dataset  provided  by  the  flowCAP  I  contest  (Aghaeepour  et  al.,  2013).  It
                  contains a collection of 30 data sampled from patients diagnosed with DLBCL.
                  These  data  were  gated  manually  by  experts  to  provide  a  benchmark  for
                  evaluating the performance of computational methods. In this batch, there
                  were 16 data that were determined to have three major cell populations. Thus,
                  we will focus on these 16 data. As can be observed from Figure 1, there are
                  substantial  differences  between  the  data.  In  particular,  the  location  of  the
                  clusters varies considerably; see, for example, the upper right cluster in data 2
                  seems  to  have  shifted  vertically  down  in  data  7.  Another  interesting
                  observation from Figure 1 is that the changes in the abundance of the clusters
                  is even more remarkable. The lower cluster in data 5 appears to be lightly
                  populated  whereas  the  same  cluster  in  data  7  is  densely  populated.  If  we
                  model  the  data  individually,  it  is  likely  that  it  will  misses  these  very  small
                  clusters in the data. On the other hand, if we pool all the data together and
                  fitted a single model to it, the large variations in cluster locations will adversely
                  affect  the  accuracy  of  the  model,  leading  to  high  error  rates  in  cell
                  segmentation. This is also manifested in a contour plot (not shown) where the
                  components show large contours in order to accommodate a wider range of
                  data points. This is a scenario where Hcyto can provide more reasonable and
                  accurate results.
                       Upon applying Hcyto to this batch, we obtain a parametric model for each
                  data as well as an overall parametric template of the batch of 16 data. The
                  three components of these mixture models are automatically matched across
                  the data. It can be observed from Figure 2 that Hcyto can handle the inter-
                  data variations quite well, as evident from the closed fitted contours. Although
                  there are very few observations/cells in the red cluster of data 5, Hcyto was
                  able to identify and model this cluster. Another remark from Figure 2 is that
                  the  cluster  shapes  differs  between  the  data.  For  example,  the  blue  cluster
                  ranges from fairly spherical (data 8), elongated (data 5), to asymmetrical (data
                  1). It is also of interest to note Hcyto correctly matches all clusters across the
                  data.

















                                                                     295 | I S I   W S C   2 0 1 9
   301   302   303   304   305   306   307   308   309   310   311