Page 236 - Special Topic Session (STS) - Volume 1
P. 236

STS426 Tanuka C.
                     the mixture distributions along with Bayesian Feature Selection to extract
                     the directions. For an extensive discussion with mathematical details and
                     implementations to both real and simulated datasets please refer to ?.
                        The mclust software has a function MclustDR() which implements the
                     method discussed in the paper mentioned above. The function, MclustDR()
                     also has an argument lambda which is basically a tuning parameter in the
                     range [0, 1] as described in ?. This argument can be tuned to recover the
                     directions  that  mostly  separate  the  estimated  clusters.  The  package  is
                     maintained by the author himself and thus provides the best possible way
                     to implement the discussed method.
                        We use this method in our present work just to see how distinct our
                     clusters are and thus helping us to validate the obtained results.

                  4.  Analyses and Interpretations of the Data
                      The dataset that we have prepared has many desirable characteristics of
                  our nearby galaxies. We proceed to the analysis part with the following steps:
                  (i)  We first perform Gaussian-Mixture-Model-Based Clustering (GMMBC) on
                      the selected number of variables. This step selects the number of optimal
                      clusters with the memberships in each of the clusters/groups.
                  (ii)  Next, we perform the Bayesian LASSO within the selected clusters with all
                      of the variables that we have used for prediction purpose. This step will
                      create a variable selection within the clusters created from the previous
                      step in order to find    values.
                  (iii) Next, we perform full Bayesian Regression on these clusters/groups with
                      the selected variables from the previous step.
                  (iv) After all of the above, we come to the interpretation part.
                      Figure 1 shows the plot of Bayesian Information Criterion (BIC) against the
                  number of mixture components required, i.e., the number of groups. We will
                  choose that number of mixture components (the number of clusters) to be
                  optimal for which the BIC is maximum. The BIC criterion (Figure 1) for selecting
                  the optimal number of clusters was giving rise to 6 clusters. After performing
                  Bayesian LASSO, we observed that Group 4 & 5 had the same set of variables
                  selected and after merging these two clusters gave the same set of selected
                  variables from the LASSO. Also, GMMBC with 6 clusters and GMMBC with 5
                  clusters only had a difference of 94.1 (1.2%) in BIC. Thus, noting these points
                  and for better interpretability of the Galaxy Clusters we continue our work with
                  5 clusters (viz. G1 - G5) each of them having unique set of variables explaining
                  their characteristics.
                      The  methods  described  in  Section  3.4  are  performed  in  our  data
                  accordingly. The two different directions describing the maximum amount of
                  separation or uniqueness between the clusters/groups are extracted from the
                  data set using the methods described in the paper ?.

                                                                     225 | I S I   W S C   2 0 1 9
   231   232   233   234   235   236   237   238   239   240   241