Page 231 - Contributed Paper Session (CPS) - Volume 4
P. 231

CPS2203 Thierry D. et al.
                                                         T. Dumont and A. Karina Fermin














                  Figure 1: Boxplots (first plot) and empirical means (second plot) of the
                                             2
                                  distances h for each sample size















                       Figure 2: Boxplots of the ratio of variables belonging to the right
                                          group for each sample size

            4.  Numerical results
                In  Section  3  we  presented  our  approach  in  nding  the  optimal  variable
            clusters unsing a n-sample of a multivariate Bernoulli distribution. Our four
            steps estimation procedure consists in
              1.  Build the set of partitions of interest  using the thresholding method
                                                       ̂
                  described in 3.1,
              2.  For  each  considered  partition  m  ∈  ,  compute  the  maximum  log-
                                                       ̂
                  likelihood  of  the  associated  model ℓ  and  the  dimension    of  its
                                                        
                  parameter space,
              3.  Use  the  slope  heuristic  method  to  approach  the  optimal  penalty
                  constant    ,
                                                                                   /
                                                        ̂
              4.  select the model ̂ minimizing among  the criterion −ℓ +    
                                                                          
                In Section 4.1 we apply our procedure on simulated data. It will allow us to
            appreciate the performance of the procedure by comparing our estimator with
            the true model used to generate the sample. In Section 4.2 we illustrate the
            performance of the procedure on the MovieLens dataset. We will provide an



                                                               220 | I S I   W S C   2 0 1 9
   226   227   228   229   230   231   232   233   234   235   236