浅谈隐式马尔可夫模型 - A Brief Note of Hidden Markov Model (HMM)

Thanks B. H. Juang in Georgia Institute of Technology.
Thanks Wayne Ward in Carnegie Mellon University.

Introduction

Problem Formulation

Now we talk about Hidden Markov Model. Well, what is HMM used for? Consider the following problem:

Given an unknown observation: O , recognize it as one of N classes with minimum probability of error.

So how to define the error and error probability?
Conditional Error: Given O , the risk associated with deciding that it is a class i event:

R(Si|O)=j=1NeijP(Sj|O)
where P(Sj|O) is the probability of that O is a class Sj event and eij is the cost of classifying a class j event as a class i event. eij>0,eii=0 .
Expected Error:
E=R(S(O)|O)p(O)dO
where S(O) is the decision made on O based on a policy. Then the question can be considered as:

How should S(O) be made to achieve minimum error probability? Or P(S(O)|O) is maximized?

Bayes Decision Theory

If we institute the policy: S(O)=Si=argmaxSjP(Sj|O) then R(S(O)|O)=minSjR(Sj|O) . It is the so-called Maximum A Posteriori (MAP) decision. But how do we know P(Sj|O),i=1,2,,M for any O ?

Markov Model

States : S={S0,S1,S2,,SN}
Transition probabilities : P(qt=Si|qt1=Sj)
Markov Assumption:

P(qt=Si|qt1=Sj,qt1=Sk,)=P(qt=Si|qt1=Sj)=aji,aji0,i=1Naji=1,j

markov model

Hidden Markov Model

States: S={S0,S1,S2,,SN}
Transition probabilities : P(qt=Si|qt1=Sj)=aji
Output probability distributions (at state j for symbol k ): P(yt=Ok|qt=Sj)=bj(k,λj) parameterized by λj .

HMM

HMM Problems and Solutions

  1. Evaluation: Given a model, compute probability of observation sequence.
  2. Decoding: Find a state sequence which maximizes probability of the observation sequence.
  3. Training: Adjust model parameters which maximizes probability of observed sequences.

Evaluation

Compute the probability of observation sequence O=o1o2oT , given a HMM model parameter λ :

P(O|λ)=QP(O,Q|λ),Q=q0q1q2qT=Qaq0q1bq1(o1)aq1q2bq2(o2)aqT1qTbqT(oT)

This is not practical since the number of paths is O(NT) . How to deal with it?

Forward Algorithm

αt(j)=P(o1o2ot,qt=Sj|λ)

Compute α recursively:
α0(j)={1,if Sj is the start state0,otherwiseαt(j)=[i=0Nαt1(i)aij]bj(ot),t>0
Then
P(O|λ)=αT(N)

Computation is O(N2T) .

Backward Algorithm

βt(i)=P(ot+1ot+2oT|qt=Si,λ)

Compute β recursively:
βT(i)={1,if Si is the end state0,otherwiseβt(i)=j=0Naijbj(ot+1)βt+1(j),t<T
Then
P(O|λ)=β0(0)

Computation is O(N2T) .

Decoding

Find the state sequence Q which maximizes P(O,Q|λ) .

Viterbi Algorithm

VPt(i)=maxq0q1qt1P(o1o2ot,qt=Si|λ)

Compute VP recursively:
VPt(j)=maxi=0,1,NVPt1(i)aijbj(ot)t>0
Then
P(O,Q|λ)=VPT(N)
Save each maximum for backtrace at end.

Training

For the sake of tuning λ to maximize P(O|λ) , there is NO efficient algorithm for global optimum, nonetheless, efficient iterative algorithm capable of finding a local optimum exists.

Baum-Welch Reestimation

Define the probability of transiting from Si to Sj at time t given O as1

ξt(i,j)=P(qt=Si,qt+1=Sj|O,λ)=αt(i)aijbj(Ot+1)βt+1(j)P(O|λ)

Let

a¯ij=Expected num. of trans. from Si to SjExpected num. of trans. from Si=T1t=0ξt(i,j)T1t=0Nj=0ξt(i,j)b¯j(k)=Expected num. of times in Sj with symbol kExpected num. of times in Sj=t:Ot+1=kNi=0ξt(i,j)T1t=0Ni=0ξt(i,j)

Training Algorithm:

  1. Initialize λ=(A,B) .
  2. Compute α,β and ξ .
  3. Estimate λ¯=(A¯,B¯) from ξ .
  4. Replace λ with λ¯ .
  5. If not converge, go to 2.

Reference

More details needed? Refer to :

  1. “An Introduction to Hidden Markov Models”, by Rabiner and Juang.
  2. “Hidden Markov Models: Continuous Speech Recognition”, by Kai-Fu Lee.

  1. Forward-Backward Algorithm
    fbalgorithm

猜你喜欢

转载自blog.csdn.net/philthinker/article/details/78630999