HMM HMM Introduction

Markov chain is a stochastic process described state transition, the process comprising properties "no memory": i.e. current probability state time $ t $ $ $ S_T distribution from the previous time $ t-1 $ state only $ S_ {t-1} $ determined, regardless of the state of the previous time $ t-1 $ time series. Markov chain transition matrix is ​​defined as $ A $, there $$ A_ {ij} = p \ left (s_ {t} = j | s_ {t-1} = i \ right), \ text {} s_ { t} | s_ {t-1} \ sim \ operatorname {Discrete} \ left (A_ {s_ {t-1},:} \ right) $$ readily seen that each row of the matrix and $ a $ 1, given a starting state S_1 $ $ (may also be generated by a distribution), the sequence can be generated by sampling from the distribution $ \ left (s_ {1}, s_2, \ dots, s_ {t} \ right) $ .

Model definition

Hidden Markov Model HMM hypothesis is generated from the observation of a hidden Markov state sequence, as shown below:

  • Without loss of generality, assume that a total of $ S $ discrete state, i.e., $ s \ in \ {1,2, \ dots, S \} $
  • Without loss of generality, assume that a total of $ $ X-species observed, i.e., $ x \ in \ {1,2, \ dots, X \} $
  • Definition of the initial state distribution is $ \ vec {\ pi} = (\ pi_1, \ pi_2, \ dots, \ pi_S) $, i.e. $ s_ {1} \ sim \ operatorname {Discrete} (\ vec {\ pi}) $
  • Defined state $ s $ transfer matrix $ \ mathcal {A} $, then $ \ mathcal {A} $ to $ S $ row $ S matrix $ columns, i.e. $ s_ {i} | \ left \ {s_ { i-1} = k ^ {\ prime} \ right \} \ sim \ operatorname {Discrete} \ left (\ mathcal {A} _ {k ^ {\ prime},:} \ right) $
  • Generation probability defined observation $ X $ is $ \ mathcal {B} $, then $ \ mathcal {B} $ matrix $ S $ row $ X-$ column, i.e. $ x_ {i} | \ left \ {s_ { i} = k ^ {\ prime} \ right \} \ sim \ operatorname {Discrete} \ left (\ mathcal {B} _ {k ^ {\ prime},:} \ right) $

Solving the HMM can be summarized into the following two questions :

  1. Training : Given the observation sequence $ \ vec {o} = ( x_1, x_2, \ dots, x_T) $, using a maximum likelihood estimation calculation parameter $ \ vec {\ pi}, \ mathcal {A}, \ mathcal {B } $, i.e. $$ \ vec {\ pi} _ {\ mathrm {ML}}, \ mathcal {A} _ {\ mathrm {ML}}, \ mathcal {B} _ {\ mathrm {ML}} = \ arg \ max _ {\ vec { \ pi}, \ mathcal {A}, \ mathcal {B}} p \ left (\ vec {o} | \ vec {\ pi}, \ mathcal {A}, \ mathcal { B} \ right) $$
  2. Prediction : Given the observation sequence $ \ vec {o} $, and a parameter $ \ vec {\ pi}, \ mathcal {A}, \ mathcal {B} $, observation sequence generated estimated $ \ vec {o} $ most likely the hidden state sequence $ \ vec {s} = ( s_ {1}, \ ldots, s_ {T}) $, i.e. $$ s_ {1}, \ ldots , s_ {T} = \ arg \ max _ {\ vec {s}} p \ left (\ vec {s} | \ vec {o}, \ vec {\ pi}, \ mathcal {A}, \ mathcal {B} \ right) $$

Parameter Estimation

To solve the problem 1, the first look at how to estimate $ p \ left (\ vec {o} | \ vec {\ pi}, \ mathcal {A}, \ mathcal {B} \ right) $, based on probability formula, there are $ $ \ begin {aligned} p (\ vec {o} | \ vec {\ pi}, \ mathcal {A}, \ mathcal {B}) & = \ sum_ {s_ {1} = 1} ^ {S} \ cdots \ sum_ {s_ {T} = 1} ^ {S} p \ left (\ vec {o}, s_ {1}, \ ldots, s_ {T} | \ vec {\ pi}, \ mathcal {A} , \ mathcal {B} \ right) \\ & = \ sum_ {s_ {1} = 1} ^ {S} \ cdots \ sum_ {s_ {T} = 1} ^ {S} \ pi_ {s_1} \ mathcal {B} _ {s_1, x_1} \ prod_ {i = 2} ^ {T} \ mathcal {A} _ {s_ {i-1}, s_i} \ mathcal {B} _ {s_i, x_i} \ end { If aligned} $$ directly calculated according to the above formula, the complexity of $ \ mathcal {O} (TS ^ T) $, poor efficiency, consider the use of dynamic programming thought, the complexity of the algorithm is reduced to $ \ mathcal {O } (TS ^ 2) $, can be used in two ways:

1. Forward Algorithm

  • Defined $ \ alpha_ {t} (j) = p \ left (x_ {1}, x_ {2} \ ldots x_ {t}, s_ {t} = j | \ vec {\ pi}, \ mathcal {A} , \ mathcal {B} \ right), \ text {} t \ in \ {1,2, \ cdots, T \}, \ text {} j \ in \ {1,2, \ cdots, S \} $ , the algorithm can be written as:
    • Initialization: $\alpha_{1}(j)=\pi_j\mathcal{B}_{j,x_1}; \quad 1 \leq j \leq S$
    • Recursion: $\alpha_{t}(j)=\sum_{i=1}^{S} \alpha_{t-1}(i) \mathcal{A}_{i j} \mathcal{B}_{j,x_t} ; \quad 1 \leq j \leq S, 1<t \leq T$
    • Termination: $p\left(\vec{o} | \vec{\pi},\mathcal{A},\mathcal{B}\right)=\sum_{i=1}^{S} \alpha_{T}(i)$

2. Backward Algorithm

  • Defined $ \ beta_ {t} (i) = p \ left (x_ {t + 1}, x_ {t + 2} \ ldots x_ {T} | s_ {t} = i, \ vec {\ pi}, \ mathcal {A}, \ mathcal {B} \ right), \ text {} t \ in \ {1,2, \ cdots, T \}, \ text {} i \ in \ {1,2, \ cdots, S \} $, the algorithm can be written as:
    • Initialization: $ \ beta_ {T} (i) = 1; \ Quad 1 \ leq i \ leq S $
    • Recursion: $\beta_{t}(i)=\sum_{j=1}^{S} \mathcal{A}_{i j} \mathcal{B}_{j,x_{t+1}} \beta_{t+1}(j) ; \quad 1 \leq i \leq S, 1 \leq t < T$
    • Termination: $p\left(\vec{o} | \vec{\pi},\mathcal{A},\mathcal{B}\right)=\sum_{j=1}^{S} \pi_{j} \mathcal{B}_{j,x_1} \beta_{1}(j)$

Using the EM algorithm to solve $ \ vec {\ pi} _ {\ mathrm {ML}}, \ mathcal {A} _ {\ mathrm {ML}}, \ mathcal {B} _ {\ mathrm {ML}} $, on EM algorithm can introduce specific reference article EM algorithm and Gaussian mixture model GMM introduced . Specific to this problem, also known as Forward-Backward Algorithm (ie, Baum -Welch Algorithm):

  • Suppose the current estimate of the parameter $ \ mathcal {\ Lambda} ^ * = [\ vec {\ pi} ^ *, \ mathcal {A} ^ *, \ mathcal {B} ^ *] $, there
    • E-step: Definition of $ q (\ vec {s}) = p (\ vec {s} | \ vec {o}, \ mathcal {\ Lambda} ^ *) $, then $ \ mathcal {L} (\ mathcal {\ Lambda }) = \ mathbb {E} _ {q} [\ ln p (\ vec {o}, \ vec {s} | \ mathcal {\ Lambda})] $. Readily seen $$ \ ln p (\ vec {o}, \ vec {s} | \ mathcal {\ Lambda}) = \ underbrace {\ ln \ pi_ {s_1}} _ {\ text {initial state}} + \ sum_ {t = 2} ^ {T} \ underbrace {\ ln \ mathcal {A} _ {s_ {t-1}, s_t}} _ {\ text {Markov chain}} + \ sum_ {t = 1} ^ {T} \ underbrace {\ ln \ mathcal {B} _ {s_t, x_t}} _ {\ text {observations}} $$ Therefore $ \ mathcal {L} $ can be written in the form $$ \ mathcal {L } (\ mathcal {\ Lambda}) = \ sum_ {k = 1} ^ {S} \ gamma_ {1} (k) \ ln \ pi_ {k} + \ sum_ {t = 2} ^ {T} \ sum_ {j = 1} ^ {S} \ sum_ {k = 1} ^ {S} \ xi_ {t} (j, k) \ ln \ mathcal {A} _ {j, k} + \ sum_ {t = 1 } ^ {T} \ sum_ {k = 1} ^ {S} \ gamma_ {t} (k) \ ln \ mathcal {B} _ {k, x_ {t}} $$ where $ \ gamma_t (k) = p (s_t = k | \ vec {o}, \ mathcal {\ Lambda} ^ *), \ text {} \ xi_t (j, k) = p (s_ {t-1} = j, s_t = k | \ vec {o}, \ mathcal {\ Lambda} ^ *);
    • M-step: Updating the parameter estimates $ \ mathcal {\ Lambda} ^ * = \ arg \ max _ {\ mathcal {\ Lambda}} \ mathcal {L} (\ mathcal {\ Lambda}) $, there $$ \ pi_ {k } ^ * = \ frac {\ gamma_ {1} (k)} {\ sum_ {j = 1} ^ S \ gamma_ {1} (j)}, \ quad \ mathcal {A} _ {jk} ^ * = \ frac {\ sum_ {t = 2} ^ {T} \ xi_ {t} (j, k)} {\ sum_ {t = 2} ^ {T} \ sum_ {l = 1} ^ {S} \ xi_ {t} (j, l)}, \ quad \ mathcal {B} _ {kv} ^ * = \ frac {\ sum_ {t = 1} ^ {T} \ gamma_ {t} (k) I \ left ( x_ {t} = v \ right)} {\ sum_ {t = 1} ^ {T} \ gamma_ {t} (k)} $$ for use in practical applications is not just a sequence, assuming $ N $ sequence, the length of each sequence is $ T_n, \ text {} n \ in \ {1,2, \ cdots, N \} $, then each sequence can calculate their own $ \ gamma_t, \ xi_t $ , denoted as $ \ gamma_t ^ {(n)}, \ xi_t ^ {(n)} $, updating equation becomes $$ \ pi_ {k} ^ * = \ frac {\ sum_ {n = 1} ^ {N } \ gamma_ {1} ^ {(n)} (k)} {\ sum_ {n = 1} ^ {N} \ sum_ {j = 1} ^ S \ gamma_ {1} ^ {(n)} (j )}, \ quad \ mathcal {A} _ {jk} ^ * = \ frac {\ sum_ {n = 1} ^ {N} \ sum_ {t = 2} ^ {T_ {n}} \ xi_ {t} ^ {(n)} (j, k)} {\ sum_ {n = 1} ^ {N} \ sum_ {t = 2} ^ {T_ {n}} \ sum_ {l = 1} ^ {S} \ xi_ {t} ^ {(n)} (j, l)},\quad \mathcal{B}_{kv}^*=\frac{\sum_{n=1}^{N} \sum_{t=1}^{T_{n}} \gamma_{t}^{(n)}(k) I\left(x_{t}^{(n)}=v\right)}{\sum_{n=1}^{N} \sum_{t=1}^{T_{n}} \gamma_{t}^{(n)}(k)}$$

  • Iterative above steps until convergence

Hidden state sequence estimation

To solve the problem 2, still using the idea of ​​dynamic programming (Viterbi Algorithm), defined $ v_ {t} (j) = \ max _ {s_ {1}, \ ldots, s_ {t-1}} p \ left (s_ {1}, \ ldots s_ {t-1}, x_ {1}, x_ {2}, \ ldots x_ {t}, s_ {t} = j | \ mathcal {\ Lambda} \ right) $, also to define $ b_t (j) $ is used to store $ v_ {t} (j) $ corresponding to $ s_ {t-1} $, the algorithm can be written as

  • Initialization: $$v_1(j)=\pi_j\mathcal{B}_{j,x_1},\text{ }b_1(j)=0; \quad 1 \leq j \leq S$$
  • Recursion: $$\begin{aligned} v_{t}(j) &=\max _{i\in\{1,2,\cdots,S\}} v_{t-1}(i) \mathcal{A}_{i j} \mathcal{B}_{j,x_t} ; \quad 1 \leq j \leq S, 1<t \leq T \\ b_{t}(j) &= \underset{i\in\{1,2,\cdots,S\}}{\arg \max } v_{t-1}(i)  \mathcal{A}_{i j} \mathcal{B}_{j,x_t} ; \quad 1 \leq j \leq S, 1<t \leq T \end{aligned}$$
  • Termination: $$\begin{aligned} & \text { The best score: }  P^*=\max _{\vec{s}} p\left(\vec{s},\vec{o} | \mathcal{\Lambda} \right)=\max _{i\in\{1,2,\cdots,S\}} v_{T}(i) \\ & \text { The start of backtrace: } s_{T}^*=\underset{i\in\{1,2,\cdots,S\}}{\operatorname{argmax}} v_{T}(i) \end{aligned}$$ 
  • Backtrace: $$ s_ {t} ^ * = b_ {t + 1} (s_ {t + 1} ^ *); \ quad 1 \ leq t <T $$ the $ s_1 ^ *, s_2 ^ *, \ ldots , s_T ^ * $ is the $ \ arg \ max _ {\ vec {s}} p \ left (\ vec {s}, \ vec {o} | \ mathcal {\ Lambda} \ right) $, is easy to see $ \ arg \ max _ {\ vec {s}} p \ left (\ vec {s}, \ vec {o} | \ mathcal {\ Lambda} \ right) = \ arg \ max _ {\ vec {s} } p \ left (\ vec {s} | \ vec {o}, \ mathcal {\ Lambda} \ right) $

Here a simple example of the use of the HMM: Suppose there are two dice, a untreated (Fair, referred to as 0), a processed (loaded, referred to as 1). Each time before use throwing a dice throw, or use another dice, the observation sequence is repeatedly thrown dice sequence obtained. Suppose the real parameters shown below, the goal is to estimate the parameters by observing the real sequence, each pitch estimate further uses which dice.

The end result shown below, using the right represents the most likely sequence of hidden states of Viterbi algorithm (i.e., each kind of dice thrown used), note left for rounding the result of simply not equal to the right, because left association between the state and not considered.

 

Guess you like

Origin www.cnblogs.com/sunwq06/p/11276475.html
HMM
HMM
HMM