Page 228 - Contributed Paper Session (CPS) - Volume 4
P. 228

CPS2203 Thierry D. et al.
                  restrict  the  considered  set  of  partitions.  A  similar  approach  has  been
                  developped in the Gaussian setting in Devijver and Gallopin (2018).
                      Once the subset of considered partitions has been built we select the best
                  partition  using  a  classical  model  selection  approach  based  on  a  penalized
                  likelihood criteria.
                      The estimator of s* is then defined as the maximun likelihood estimator
                  dened by the chosen data-driven partition.
                      We implemented our method and applied it to real data. We used the
                  MovieLens dataset made of ratings of 137000 users and consider the top 1000
                  most rated movies.
                     This paper is organized as follows. After introducing the notations used
                  throughout the paper, we present our three step method : data-driven pre-
                  selection  of  the  set  of  partitions  of  interest,  partition  selection  using  a
                  penalized likelihood approach and calibration of the penalty using the classical
                  slope heuristic. The performance of the method is studied using synthetic data
                  in Section 4.1 and using the MovieLens dataset in Section 4.2.

                  2.  Covariates partitions
                  2.1  Basic notations
                      Let p ∈ N be a the number of covariates. Consider the index set {1,...,}. In
                  the  following,  we  denote  by   ()  the ℎ component  of  a  vector  and  by
                    ()  =   () ;   ∈  ) the group of variables from a cluster B ⊆ {1,...,p}.
                     Throughout  the  article,  m  =  {B1,B2,...,BK}  will  denote  a  partition  of  the

                  covariates into K disjoint clusters B1,B2,...,BK with ∪    = {1, … , }.
                                                                       
                                                                  =1
                     Denote by k = |k| the number of variables in the cluster .
                     Denote by  the set of all possible partitions of variables. The set  is
                  large: its size corresponds to the Bell’s number which exponentially growth
                  with .

                  2.2 Model collection associated with a partition
                      Let    ∈    be  a  covariates  partition.  We  associate  with    a  set  of
                                                                                    p
                  probability densities with respect with the uniform measure on {0,1} defined
                  by
                                                         
                                           = {() = ∏  ( (  ) )}
                                                            
                                           
                                                        =1
                  where, for any   ∈ {1, . . . , },   is a probability density on {0,1} .
                                                                                
                                                

                  3.  The Method
                      Suppose that we observe some data  ,  , . . . ,   ∈ {0,1} considered as an
                                                                            p
                                                          1
                                                             2
                                                                   
                                                                                    p
                   i.i.d. realizations of an unknown probability distribution  on {0,1} . On the
                                                                           ⋆
                                                                     217 | I S I   W S C   2 0 1 9
   223   224   225   226   227   228   229   230   231   232   233