Page 233 - Contributed Paper Session (CPS) - Volume 4
P. 233

CPS2203 Thierry D. et al.
                In Figure 2 we compute the boxplots of the proportion of covariates whose
            estimated bloc corresponds to one of the true bloc  . This graphic allows us
                                                                ⋆
                                                                
            to appreciate the quality of the model selection side of the procedure.

            4.2 Movielens data set
                MoviLens dataset Harper and Konstan (2015) contains ratings from 137753
            users on 27278 movies( excluding movies with no rating values). Ratings on a
            1-5 scale and each user has rated at most 367 movies. We restrict our study to
            the rst 1000 most often rated movies. For each movie and each user we study
            the variable equal 1 if the user rated the movie and 0 otherwise. Our selection
            method applied to the dataset selected a partition made of 322 groups of
            movies whose size vary from 2 to 16. Figure 3 represents the distribution of
            the number of variables by group in the partition. Figure 4 represents the
            movies that belong to the biggest group. We notice that most of them are
            action/Sci- /Adventure movies released in the early 2000. Similarly most of the
            groups are made of movies with similar genres and years. Other examples of
            groups forming the selected partition are provided by Figure 5.

            5.   Discussion and conclusion
                Figures 4, 5 illustrate the quality of the variables clustering provided by the
            method. We also provide a consistent estimator of the target distribution  .
                                                                                      ∗
            This estimator may be used to understand the joint behavior of the variables
            belonging to the same bloc. Conditioning    ̂  can also allow prediction on new
            partially observed dataset.























                                                               222 | I S I   W S C   2 0 1 9
   228   229   230   231   232   233   234   235   236   237   238