HMM的参数学习问题

HMM的参数学习问题

HMM的参数学习问题有两种:

  1. 监督学习:给定观测序列 O = ( o 1 , . . . , o T ) 和对应的状态序列 I = ( i 1 , . . . , i T ) ,估计参数 λ = ( A , B , π )

  2. 非监督学习:只给定观测序列 O = ( o 1 , . . . , o T ) ,估计参数 λ = ( A , B , π )


监督学习(极大似然直接估计)

监督学习通过使用训练数据,来得到观测序列和对应的隐状态。然后计算相应的频数值,作为参数的近似估计。


非监督学习(Baum-Welch算法迭代估计)

Baum-Welch算法的本质即EM算法,是用于含有隐向量的模型中,进行参数学习的迭代算法。回顾EM算法的核心,是按照 Θ ( g + 1 ) Θ ( g ) 之间的等式关系:

Θ ( g + 1 ) = a r g m a x Θ { Q ( Θ , Θ ( g ) ) } = a r g m a x Θ z P ( Z | X , Θ ( g ) ) l o g P ( X , Z | Θ ) d z

不断更新参数,并且保证每一次更新,都能使对数似然函数逐渐增大。

在非监督学习的情况下,我们只有观测序列 O = ( o 1 , . . . , o T ) ,而状态序列 I 被视为不可观测的隐变量,此时HMM就是一个含有隐变量的概率模型:

P ( O | λ ) = I P ( O | I , λ ) P ( I | λ )

此时的参数估计可以用EM算法实现。这里,参数 λ 的迭代规则为:

λ ( g + 1 ) = a r g m a x λ { Q ( λ , λ ( g ) ) } = a r g m a x λ I P ( I | O , λ ( g ) ) l o g P ( O , I | λ ) d I

其中, λ ( g ) 是上一次迭代得到的参数, λ ( g + 1 ) 是下一次迭代更新的参数。

E-step

如上,在HMM中,求期望的公式为:

Q ( λ , λ ( g ) ) = I P ( I | O , λ ( g ) ) l o g P ( O , I | λ ) d I = I P ( I | O , λ ( g ) ) l o g P ( O , I | λ )

由于 P ( I | O , λ ( g ) ) = P ( O , I | λ ( g ) ) P ( O | λ ( g ) ) ,注意 λ ( g ) 是一个常数,因此对于 λ 来说, 1 P ( O | λ ( g ) ) 是一个常数因子,不会对 a r g m a x 的结果产生任何影响。因此, Q 函数又可写为:

Q ( λ , λ ( g ) ) = I P ( O , I | λ ( g ) ) l o g P ( O , I | λ )

HMM的概率计算问题-直接计算章节,已求得:

P ( O , I | λ ) = π i 1 t = 1 T b i t ( o t ) t = 1 T 1 a i t i t + 1

代入 Q 函数并展开,记为式1

Q ( λ , λ ( g ) ) = I P ( O , I | λ ( g ) ) l o g [ π i 1 t = 1 T b i t ( o t ) t = 1 T 1 a i t i t + 1 ]

= I P ( O , I | λ ( g ) ) l o g π i 1 + I P ( O , I | λ ( g ) ) t = 1 T l o g b i t ( o t ) + I P ( O , I | λ ( g ) ) t = 1 T 1 l o g a i t i t + 1

M-step

上述式1被展开为3项:它们分别包含了初始状态概率向量 π i 1 观测概率矩阵的元素 b i t ( o t ) 状态转移概率矩阵的元素 a i t i t + 1 ,可以分别用于估计参数 π B N × M A N × N 。现在分别对每一项做最大化,求出下一步的迭代参数。

  • π i 1

    I P ( O , I | λ ( g ) ) l o g π i 1

    = i 1 . . . i T [ P ( O , I | λ ( g ) ) l o g π i 1 ]

    = i 1 l o g π i 1 [ i 2 . . . i T P ( O , i 1 , i 2 , . . . , i T | λ ( g ) ) ]

    = i 1 l o g π i 1 P ( O , i 1 | λ ( g ) )

    = i = 1 N l o g π i P ( O , i 1 = q i | λ ( g ) )

    由于初始状态概率必须满足 i = 1 N π i = 1 ,因此构造拉格朗日方程:

    L ( π i ) = i = 1 N l o g π i P ( O , i 1 = q i | λ ( g ) ) γ ( i = 1 N π i 1 )

    分别对 π i γ 求偏导,并令其等于0:

    L π i = P ( O , i 1 = q i | λ ( g ) ) π i γ = 0

    L γ = ( i = 1 N π i 1 ) = 0

    联立解得:

π i ( g + 1 ) = P ( O , i 1 = q i | λ ( g ) ) i = 1 N P ( O , i 1 = q i | λ ( g ) ) = P ( O , i 1 = q i | λ ( g ) ) P ( O | λ ( g ) )

  • b i t ( o t )

    I P ( O , I | λ ( g ) ) t = 1 T l o g b i t ( o t )

    = I [ P ( O , I | λ ( g ) ) l o g b i 1 ( o 1 ) + . . . + P ( O , I | λ ( g ) ) l o g b i T ( o T ) ]

    = I P ( O , I | λ ( g ) ) l o g b i 1 ( o 1 ) + . . . + I P ( O , I | λ ( g ) ) l o g b i T ( o T )

    = i = 1 N P ( O , i 1 = q i | λ ( g ) ) l o g b i ( o 1 ) + . . . + i = 1 N P ( O , i T = q i | λ ( g ) ) l o g b i ( o T )

    = i = 1 N t = 1 T P ( O , i t = q i | λ ( g ) ) l o g b i ( o t )

    由于观测概率矩阵的行和均为 1 ,即必须满足 N 个约束条件: k = 1 M b i ( o t = v k ) = 1 , i { 1 , 2 , . . . , N } ,因此构造拉格朗日方程:

    L ( b i ( o t ) ) = i = 1 N t = 1 T P ( O , i t = q i | λ ( g ) ) l o g b i ( o t ) i = 1 N γ i ( k = 1 M b i ( o t = v k ) 1 )

    分别对 b i ( o t ) γ i 求偏导,并令其等于0:

    【注】:只有在 o t = v k 时, b i ( o t ) b i ( v k ) 的偏导才不为零,以 I ( o t = v k ) 表示。

    L b i ( o t ) = t = 1 T P ( O , i t = q i | λ ( g ) ) b i ( o t ) i = 1 N γ i = 0

    L γ i = ( k = 1 M b i ( o t = v k ) 1 ) = 0

    联立解得:

b i ( o t = v k ) ( g + 1 ) = t = 1 T P ( O = v k , i t = q i | λ ( g ) ) k = 1 M t = 1 T P ( O = v k , i t = q i | λ ( g ) )

= t = 1 T P ( O , i t = q i | λ ( g ) ) I ( o t = v k ) t = 1 T P ( O , i t = q i | λ ( g ) )

  • a i t i t + 1

    I P ( O , I | λ ( g ) ) t = 1 T 1 l o g a i t i t + 1

    = I [ P ( O , I | λ ( g ) ) l o g a i 1 i 2 + . . . + P ( O , I | λ ( g ) ) l o g a i T 1 i T ]

    = I P ( O , I | λ ( g ) ) l o g a i 1 i 2 + . . . + I P ( O , I | λ ( g ) ) l o g a i T 1 i T

    = i = 1 N j = 1 N P ( O , i 1 = q i , i 2 = q j | λ ( g ) ) l o g a i j + . . . + i = 1 N j = 1 N P ( O , i T 1 = q i , i T = q j | λ ( g ) ) l o g a i j

    = i = 1 N j = 1 N t = 1 T 1 P ( O , i t = q i , i t + 1 = q j | λ ( g ) ) l o g a i j

    由于状态转移概率矩阵的行和均为 1 ,即必须满足 N 个约束条件 j = 1 N a i j = 1 , i { 1 , 2 , . . . , N } ,因此构造拉格朗日方程:

    L ( a i j ) = i = 1 N j = 1 N t = 1 T 1 P ( O , i t = q i , i t + 1 = q j | λ ( g ) ) l o g a i j i = 1 N γ i ( j = 1 N a i j 1 )

    分别对 a i j γ i 求偏导,并令其等于0:

    L a i j = t = 1 T 1 P ( O , i t = q i , i t + 1 = q j | λ ( g ) ) a i j i = 1 N γ i = 0

    L γ i = ( j = 1 N a i j 1 ) = 0

    联立解得:

a i j ( g + 1 ) = t = 1 T 1 P ( O , i t = q i , i t + 1 = q j | λ ( g ) ) j = 1 N t = 1 T 1 P ( O , i t = q i , i t + 1 = q j | λ ( g ) )

= t = 1 T 1 P ( O , i t = q i , i t + 1 = q j | λ ( g ) ) t = 1 T 1 P ( O , i t = q i | λ ( g ) )

猜你喜欢

转载自blog.csdn.net/Joyliness/article/details/79603950