Page 306 - Contributed Paper Session (CPS) - Volume 4
P. 306
CPS2233 Sharon Lee
3. Result
For demonstration, we consider the diffuse large B-cell lymphoma (DLBCL)
dataset provided by the flowCAP I contest (Aghaeepour et al., 2013). It
contains a collection of 30 data sampled from patients diagnosed with DLBCL.
These data were gated manually by experts to provide a benchmark for
evaluating the performance of computational methods. In this batch, there
were 16 data that were determined to have three major cell populations. Thus,
we will focus on these 16 data. As can be observed from Figure 1, there are
substantial differences between the data. In particular, the location of the
clusters varies considerably; see, for example, the upper right cluster in data 2
seems to have shifted vertically down in data 7. Another interesting
observation from Figure 1 is that the changes in the abundance of the clusters
is even more remarkable. The lower cluster in data 5 appears to be lightly
populated whereas the same cluster in data 7 is densely populated. If we
model the data individually, it is likely that it will misses these very small
clusters in the data. On the other hand, if we pool all the data together and
fitted a single model to it, the large variations in cluster locations will adversely
affect the accuracy of the model, leading to high error rates in cell
segmentation. This is also manifested in a contour plot (not shown) where the
components show large contours in order to accommodate a wider range of
data points. This is a scenario where Hcyto can provide more reasonable and
accurate results.
Upon applying Hcyto to this batch, we obtain a parametric model for each
data as well as an overall parametric template of the batch of 16 data. The
three components of these mixture models are automatically matched across
the data. It can be observed from Figure 2 that Hcyto can handle the inter-
data variations quite well, as evident from the closed fitted contours. Although
there are very few observations/cells in the red cluster of data 5, Hcyto was
able to identify and model this cluster. Another remark from Figure 2 is that
the cluster shapes differs between the data. For example, the blue cluster
ranges from fairly spherical (data 8), elongated (data 5), to asymmetrical (data
1). It is also of interest to note Hcyto correctly matches all clusters across the
data.
295 | I S I W S C 2 0 1 9