Page 265 - Contributed Paper Session (CPS) - Volume 4
P. 265
CPS2220 David Degras et al.
Whereas the literature typically describes the EM algorithm for the
general model (1), its extension to models (2) and (3) is not entirely
straightforward, a least not when the maximum lag in these models is
greater than 1. In this case, additional modeling assumptions must be made
on the joint distribution of 1: or alternatively of (2−):1 . Also, conditioning
must be carefully done in the smoothing part of the E-step to avoid issues of
degeneracy and numerical inaccuracy. (For example, one should not naively
condition (−+1): on (−):(+1) and vice-versa).
3.2 M-step
The M-step consists in maximizing the −function (5) with respect to .
In the absence of constraints on , this amounts to a simple least squares
problem in linear regression and the solution can be found analytically. But
̂
even then, some regularization may be required to ensure that the transition
matrices ℓ define invertible, stationary processes. Typical parameter
constraints are fixed coefficients constraints (e.g. make covariance matrices
, and/or diagonal), equality constraints across regimes, and scaling
constraints on the norms of or . We rigorously enforce any number of
these constraints in our software package via a projected gradient approach.
3.3 Initialization
Given that the EM algorithm is only guaranteed to converge to a stationary
point of the likelihood function, choosing good starting points is essential to
increase the chances that the EM converges to a global maximum. We propose
two initializations methods for the switching dynamics model (2) and then
adapt them to the switching observation model (3). We assume here that the
higher-order model parameters , , , are fixed.
Initialization for the switching dynamics model (method 1)
1. Perform the singular value decomposition (SVD) of the data: = ’,
where = ( , … , )(the rows of should be centered on zero), is of
1
dimension × with = (, ) and = , Nis of
′
′
dimension × with = , and = diag( , . . . , ) with ≥
1
1
. . . ≥ ≥ 0. Let and be the submatrices obtained by taking the
first r columns (i.e. singular vectors) of and , and let =
( , . . . , ). Initialize the estimated observation matrix and
1
estimated state vectors as = and = (x̂ , … , x̂ ) = ′
̂
̂
1
.
2. Initialize as the diagonal matrix containing the sample variances of the
̂
(rows of the) residual matrix − .
̂̂
3. Force the ̂ and to be equal across regimes (1 ≤ ≤ ) and set
̂
them to the sample mean and sample variance of xb1,...,xbp.
254 | I S I W S C 2 0 1 9