Page 20 - Contributed Paper Session (CPS) - Volume 6
P. 20

CPS1468 Takeshi Kurosawa et al.
                     One particularly appealing characteristic of   is that it can be derived in
                                                                 pp
                  terms of in terms of Kullback-Leibler distances.
                  Theorem  2.1.  (Theorem  2.1,  Eshima  and  Tabata,  2007)  The  measure  of
                  predictive power  pp  can be expressed as
                         (, ) = ∫ {(), (|)}() = [{(), (|)}],
                          pp

                  ℎ () = ∫ (|)}() is the marginal distribution of the response and

                                                    (|)                ()
                  {(), (|)} = ∫ (|) log (  )  + ∫ () log (  ) ,      (1)
                                                     ()                 (|)

                  is referred to as Kullback-Leibler divergence in this article.
                      Note  that  {(), (|)}  is  also  often  referred  as  the  symmetric
                  Kullback-Leibler distance between () and (|), and differs from the more
                  common (asymmetric) Kullback-Leibler distance which comprises only the first
                  term in (1).
                      The  pp  is a relative measure is the sense that  (, ) ≥ 0 (as proved by
                                                                    pp
                  Eshima  and  Tabata,  2007),  but  it  is  not  bounded.  On  the  other  hand,  by
                  standardizing  the  formula  in  Definition  (2.1)  we  can  obtain  the  entropy
                  correlation coefficient (ECC, Eshima and Tabata, 2007). Unlike   (, ), the
                                                                                 pp
                  ECC is an absolute measure in the sense that 0  ≤    ≤  1. As an illustration
                  of  this,  if  we  consider  a  linear  model  with  ( |) =    +   =    and
                                                                                 ⊤
                  () =    =  ( |), then it can shown that (Eshima and Tabata, 2007,
                            2
                  see Example 1)
                                                  var ([ |])   2
                                       (, ) =  [var(|)]  =  1 −  2 ,
                                        pp

                  where  is the multiple correlation coefficient.

                  3. Estimation of  pp  in Poisson GLMs
                  We now focus on the specific case of Poisson GLMs with log link function:
                           log{( |)} =    +  ,    |  ∼  {exp(  +  )},                       (2)
                                                ⊤
                                                                          ⊤
                  where () denotes the Poisson distribution with parameter   >  0. Note that
                  a consequence of using the log link is that   =    +  .
                                                                       ⊤
                      We can simply consider an estimator of  . For a dataset comprising 
                                                               pp
                  observations {( ,  );  = 1, … , }let   = ( , . . . ,  ) denote  the  vector  of
                                                                      ⊤
                                                                   
                                   
                                                             1
                                     
                                 ̂
                  responses  and  = (1/) ∑     denote  the  sample  mean.  Also,  suppose a
                                             =1
                                                 
                  Poisson  GLM  as  in  (2)  is  fitted  to  the  data,  yielding  maximum  likelihood
                                              ̂
                  estimates (̂,  ) .  Then  let  = ̂ +      denote  the  estimated  canonical
                                ̂  
                                                       ̂ ⊤
                                               
                                                           
                                                                                       ⊤
                              ̂
                                                                         ̂
                                                                                    ̂
                                                                              ̂
                                             ̂
                  parameter,   = (1/) ∑       its  sample  mean,  and   = ( , … ,  ) .  The
                                                                                    
                                                                               1
                                              
                                         =1
                  unbiased covariance estimator is
                                                                       9 | I S I   W S C   2 0 1 9
   15   16   17   18   19   20   21   22   23   24   25