Page 265 - Contributed Paper Session (CPS) - Volume 4
P. 265

CPS2220 David Degras et al.
                Whereas  the  literature  typically  describes  the  EM  algorithm  for  the
            general  model  (1),  its  extension  to  models  (2)  and  (3)  is  not  entirely
            straightforward,  a  least  not  when  the  maximum  lag  in  these  models  is
            greater than 1. In this case, additional modeling assumptions must be made
            on the joint distribution of  1:  or alternatively of  (2−):1 . Also, conditioning
            must be carefully done in the smoothing part of the E-step to avoid issues of
            degeneracy and numerical inaccuracy. (For example, one should not naively

            condition  (−+1):  on  (−):(+1)  and vice-versa).

            3.2 M-step
                The M-step consists in maximizing the  −function (5) with respect to .
            In the absence of constraints on , this amounts to a simple least squares
            problem in linear regression and the solution  can be found analytically. But
                                                        ̂
            even then, some regularization may be required to ensure that the transition
            matrices   ℓ  define  invertible,  stationary  processes.  Typical  parameter
            constraints are fixed coefficients constraints (e.g. make covariance matrices
             ,   and/or  diagonal),  equality  constraints  across  regimes,  and  scaling
             
                 
            constraints on the norms of  or  . We rigorously enforce any number of
                                               
            these constraints in our software package via a projected gradient approach.

            3.3 Initialization
                Given that the EM algorithm is only guaranteed to converge to a stationary
            point of the likelihood function, choosing good starting points is essential to
            increase the chances that the EM converges to a global maximum. We propose
            two initializations methods for the switching dynamics model (2) and then
            adapt them to the switching observation model (3). We assume here that the
            higher-order model parameters , , , are fixed.

            Initialization for the switching dynamics model (method 1)
            1.  Perform the singular value decomposition (SVD) of the data:   =  ’,
                 where  = ( , … ,  )(the rows of  should be centered on zero),  is of
                              1
                                    
                 dimension    ×    with    =  (, )  and    =  ,  Nis  of
                                                                    ′
                                                                          
                                          ′
                 dimension   ×    with   =  , and   =  diag( , . . . ,  ) with  ≥
                                                                   1
                                                                         
                                                                                   1
                                                
                 . . . ≥    ≥ 0. Let   and   be the submatrices obtained by taking  the
                                          
                                   
                 first  r  columns  (i.e.  singular  vectors)  of    and   ,  and  let   =
                                                                                   
                 ( , . . . ,  ).  Initialize  the  estimated  observation  matrix  and
                       1
                              
                 estimated state vectors as  =   and  = (x̂ , … , x̂ ) =  ′
                                                      ̂
                                           ̂
                                                            1
                                                                  
                                                                           .
                                                                        
                                                
            2.  Initialize  as the diagonal matrix containing the sample variances of the
                         ̂
                 (rows of the) residual matrix  − .
                                                 ̂̂
            3.    Force the  ̂  and   to be equal across regimes (1  ≤    ≤  ) and set
                                   ̂
                                    
                             
                 them  to  the  sample  mean  and  sample  variance  of  xb1,...,xbp.
                                                               254 | I S I   W S C   2 0 1 9
   260   261   262   263   264   265   266   267   268   269   270