HMM推导

隐马尔科夫模型作了两个基本假设

(1)齐次马尔可夫性假设:马尔可夫链每个时刻的隐藏状态只与前一个隐藏状态相关

(2)观测独立性假设:任意时刻的观测只与当前时刻的状态有关

隐马尔可夫模型有3个基本问题:

(1)概率计算问题,即给定模型参数和观测序列,计算观测序列出现的概率。

(2)学习问题,即给定观测序列,学习模型的参数。

(3)预测问题,即给出模型参数和观测序列,求具有最大概率的状态序列。

针对概率计算问题有前向算法和后项算法两种

1.概率计算

1.1 前向算法

定义前向概率

$$\alpha_t(i)=P(o_1,\cdots,o_t,i_t=q_i)$$

$$\begin{align} \alpha_t(j)a_{ji}=& P(o_1,\cdots,o_t,i_t=q_j)P(i_{t+1}=q_i|i_t=q_j)\\ =& P(o_1,\cdots,o_t,i_t=q_j)P(i_{t+1}=q_i|o_1,\cdots,o_t,i_t=q_j)\\ =& P(o_1,\cdots,o_t,i_t=q_j,i_{t+1}=q_i) \end{align}$$

$$\begin{align} \alpha_t(j)a_{ji}b_i(o_{t+1})=& P(o_1,\cdots,o_{t+1},i_t=q_j,i_{t+1}=q_i)\end{align}$$

因此

$$\begin{align} \alpha_{t+1}(i) =& \sum_j P(o_1,\cdots,o_{t+1},i_t=q_j,i_{t+1}=q_i)\\=& \sum_j \alpha_t(j)a_{ji}b_i(o_{t+1})\end{align}$$

$$P(O)=P(o_1,\cdots,o_T)=\sum_i \alpha_T(i)$$

1.2 后向算法

定义后向概率

$$\beta_t(i)=P(o_{t+1},\cdots,o_T|i_t=q_i)$$

$$\beta_T(i)=1$$

$$\begin{align}a_{ij}b_j(o_{t})\beta_t(j)=& P(i_t=q_j|i_{t-1}=q_i)P(o_t|i_t=q_j)P(o_{t+1},\cdots,o_T|i_t=q_j)\\ =& P(i_t=q_j|i_{t-1}=q_i)P(o_t,\cdots,o_T|i_t=q_j)\end{align}$$

因此

$$\begin{align} \beta_{t-1}(i)=& \sum_j P(i_t=q_j|i_{t-1}=q_i)P(o_t,\cdots,o_T|i_t=q_j)\\ =& \sum_j a_{ij}b_j(o_t)\beta_t(j)\end{align}$$

$$\begin{align} P(O)=& \sum_i \pi_i b_i(o_1)\beta_1(i)\end{align}$$

2.一些概率和期望的计算

给定观测\(O\)和模型参数\(\lambda\),在\(t\)时刻处于状态\(i\)的概率

$$\begin{align} \gamma_t(i)=& P(i_t=q_i|O)\\ =& \frac{\alpha_t(i)\beta_t(i)}{\sum_j \alpha_t(j)\beta_t(j)} \end{align}$$

给定观测\(O\)和模型参数\(\lambda\),在\(t\)时刻处于状态\(i\)且在\(t+1\)时刻处于状态\(j\)的概率

$$\xi_t(i,j)=\frac{\alpha_t(i)a_{ij}b_j(o_{t+q})\beta_{t+1}(j)}{\sum_i \sum_j \alpha_t(i)a_{ij}b_j(o_{t+q})\beta_{t+1}(j)}$$

3.学习算法

3.1 监督学习

给定S个观测序列和状态序列,那么模型的参数可以直接由对应的频率来估计,具体的

$$a_{ij} = \frac{A_{ij}}{\sum_j A_{ij}}$$

$$b_j(k)=\frac{B_{jk}}{\sum_j B{jk}}$$

$$\pi_i = \frac{n_i}{|S|}$$

3.2 无监督学习(Baum-Welch算法)

即EM算法估计模型参数\(\lambda\),其中隐变量为状态\(I\),求解\(\lambda\)使得以下似然函数最大

$$P(O|\lambda)$$

Q函数

$$\begin{align}Q(\lambda, \lambda^{(i)})=& \sum_I logP(O, I|\lambda)P(O,I|\lambda^{(i)})\\=&  \sum_I log\pi_{i_1}P(O,I|\lambda^{(i)}) +\sum_I(\sum_{t=1}^{T-1}a_{i_ti_{t+1}})P(O,I|\lambda^{(i)}) + \sum_I(\sum_{t=1}^Tb_{i_t}(o_t))P(O,I|\lambda^{(i)}) \end{align}$$

求Q函数最大化只需要对上面的三项分别最大化

对于第一项需在条件\(\sum_i \pi_i=1\)下最大化

$$\sum_I log\pi_{i_1}P(O,I|\lambda^{(i)})=\sum_i log\pi_i P(O, i_1=i|\lambda^{(i)})$$

$$\begin{align}L(\pi,\gamma)=& \sum_i log\pi_i P(O, i_1=i|\lambda^{(i)})+\gamma (\sum_i \pi_i-1)\end{align}$$

$$\frac{\partial L}{\partial \pi_i}= \frac{P(O,i_1=i|\lambda^{(i)})}{\pi_i}+\gamma=0$$

$$\frac{\partial L}{\partial \gamma}=\sum_i \pi_i-1=0$$

$$\begin{align}\pi_i=& \frac{P(O,i_1=i|\lambda^{(i)})}{P(O|\lambda{(i)})}\\=& \gamma_t(i) \end{align}$$

对于第二项需在条件\(\sum_j a_{ij}=1\)下最大化

$$\sum_I (\sum_{t=1}^{T-1}log a_{i_ti_{t+1}})P(O,I|\lambda^{(i)})=\sum_i\sum_j \sum_t{t=1}{T-1}loga_{ij}P(O,i_t=i,i_{t+1}=j|\lambda^{(i)})$$

同样可得

$$\begin{align} a_{ij}=&\frac{\sum_{t=1}^{t+1}P(O,i_t=i, i_{t+1}=j|\lambda^{(i)})}{P(O,i_t=i|\lambda^{(i)})}\\=& \frac{\sum_{t=1}^{T-1}\xi_i(i,j)}{\sum_{t=1}^{T-1}\gamma_t(i)} \end{align}$$

对于第三项需在条件\(\sum_k b_j(k)=1\)下最大化

$$\begin{align} L(b, \gamma)=& \sum_j \sum_t logb_j(o_t)P(O,I|\lambda^{(i)})+\gamma(\sum_j b_j(k)-1)\\ =& \sum_j \sum_t logb_j(o_t)P(O,i_t=j|\lambda^{(i)})\end{align}$$

$$\frac{\partial L}{\partial b_j(k)}=\frac{P(O,i_t=j)}{b_j(k)}I(o_t=v_k)+\gamma=0$$

$$\frac{\partial L}{\partial \gamma}=\sum_kb_j(k)=1$$

得到

$$\begin{align}b_j(k)=& \frac{\sum_tP(O,i_t=j|\lambda^{(i)})}{\sum_tP(O,i_t=j|\lambda^{(i)})}\\ =& \frac{\sum_{t,o_t=v_k}\gamma_t(j)}{\sum_t \gamma_t(j)}\end{align}$$

猜你喜欢

转载自blog.csdn.net/Xafter0/article/details/81145246
HMM