Page 234 - Special Topic Session (STS) - Volume 1
P. 234

STS426 Tanuka C.
                        and  is  a  prior  distribution  on  the  parameters  ,  , ∑ ,  which
                                                                           
                                                                                 
                                                                              
                        includes other parameters denoted by . Under this setup, we try to
                        find a posterior mode or MAP (maximum a posteriori) estimate rather
                        than  a  ML-estimate  for  the  mixture  parameters.  For  further
                        discussions on regularization, we refer to ?. This method has been
                        implemented  in  the  software  mclust  (https://cran.r-project.org/
                        web/packages/mclust/index.html) built under the R environment by
                        the same authors of the paper. We have used mclust for our work
                        and did not face issue with convergence of the EM algorithm (instead,
                        we found excellent computation speed with fast convergence) and
                        thus have not used any prior .

                        3.2.2 Model selection under GMMBC setup
                           As  already  discussed,  the  problems  in  applied  cluster  analysis:
                        selection of clustering method and that of number of clusters can be
                        postulated  into  one  single  problem  of  Statistical  Model  Selection
                        under the MBC setup (?). The approach taken to the problem is based
                        on Bayesian Model Selection via the use of Bayes’ Factors (?) and
                        posterior  model  probabilities.  The  idea  goes  like  this:  Let  several
                        models {ℳ , ℳ , . . . , ℳ } are considered with prior probabilities of
                                  1
                                       2
                                               
                        getting selected as (ℳ ), k = 1, . . . , K often taken to be equal. Now,
                                               
                        by applying Bayes’ Theorem, the posterior probability of ℳ  getting
                                                                                 
                        selected given the data (D) is:

                        (|ℳ ) ∝ (|ℳ )(ℳ )
                                                
                                          
                              
                        where,

                        (|ℳ ) = ∫ (| , ℳ )( |ℳ )
                                                            
                                           
                              
                                               
                                                    
                                                        

                        where, ( |ℳ ) is the prior distribution of  : the parameter vector
                                                                   
                                      
                                  
                        for the model ℳ . (|ℳ ) is known as Integrated Likelihood of the
                                        
                                                 
                        model ℳ . This likelihood will help us in deciding the best model. We
                                 
                        will  choose  that  model  which  is  maximum  likely  a  posteriori.
                        Assuming the priors (ℳ )s are equal, then we select the model with
                                                
                        highest  integrated likelihood, i.e., if we are comparing ℳ  and ℳ  ,
                                                                                
                                                                                        
                        then we calculate:

                                    
                        ℬ =   (|ℳ )
                         
                             (|ℳ )
                                   
                        With the comparison favouring ℳ , if:
                                                        

                        ℬ > 1 ⇔ (|ℳ ) > (|ℳ )
                         
                                                    
                                         
                                                                     223 | I S I   W S C   2 0 1 9
   229   230   231   232   233   234   235   236   237   238   239