ML基本知识(六)EM 算法

  • Jensen不等式

    对于凸函数 f f f ( x ) 0 f''(x)\geq0 )和随机变量 X X , 那么有如下结论

    E [ f ( x ) ] f ( E X ) E[f(x)]\geq f(EX)

    如果 f f 为严格凸函数,而且 X X 为常量,即 E [ X ] = X E[X]=X , 则有

E [ f ( X ) ] = f ( E X ) E[f(X)]=f(EX)
而对于凹函数 f f f ( x ) 0 f''(x)\leq0 )和随机变量 X X , 则有 E [ f ( x ) ] f ( E X ) E[f(x)]\leq f(EX)

  • EM算法

    如果有训练集 { x ( 1 ) , x ( 2 ) , . . . , x ( m ) } \{x^{(1)}, x^{(2)},...,x^{(m)}\} , 存在隐变量 z z , 我们想求解出 p ( x , z ) p(x,z) , 那么现在似然函数可写为

    l ( θ ) = i = 1 m l o g p ( x ; θ ) = i = 1 m l o g z p ( x , z ; θ ) l(\theta )=\sum_{i=1}^{m}logp(x;\theta )=\sum_{i=1}^{m}log\sum _{z}p(x,z;\theta )

    • E 步骤

      那么对于 x ( i ) x^{(i)} 来说,有一个隐变量 z ( i ) z^{(i)} 不好求解,那么我们可以通过假设 z ( i ) z^{(i)} 的分布 Q i ( z ) Q_i(z) 来辅助求解,显而易见 z Q i ( z ) = 1 \sum_z Q_i(z)=1 , 因而通过如下式子,

      i l o g p ( x ( i ) ; θ ) = i l o g z ( i ) p ( x ( i ) , z ( i ) ; θ ) \sum _i logp(x^{(i)};\theta )=\sum _i log\sum _{z^{(i)}}p(x^{(i)}, z^{(i)};\theta ) = i l o g z ( i ) Q i ( z ( i ) ) p ( x ( i ) , z ( i ) ; θ ) Q i ( z ( i ) ) =\sum _i log\sum _{z^{(i)}}Q_i(z^{(i)})\frac{p(x^{(i)},z^{(i)};\theta )}{Q_i(z^{(i)})} i z ( i ) Q i ( z ( i ) ) l o g p ( x ( i ) , z ( i ) ; θ ) Q i ( z ( i ) ) \geq \sum _i \sum _{z^{(i)}}Q_i(z^{(i)})log\frac{p(x^{(i)},z^{(i)};\theta )}{Q_i(z^{(i)})}

      不等式成立的原因是 l o g ( x ) log(x) 为凹函数,

      而现在我们需要选择的就是 Q i Q_i 的取值,那么我们现在就可以选择能够使不等式的等号成立的 Q i Q_i , 这时根据Jensen不等式的成立条件,

      p ( x ( i ) , z ( i ) ; θ ) Q i ( z ( i ) ) = c ( c o n s t a n t ) \frac{p(x^{(i)},z^{(i)};\theta )}{Q_i(z^{(i)})}=c(constant)
      因而有 Q i ( z ( i ) ) Q_i(z^{(i)}) 正比于 p ( x ( i ) , z ( i ) ; θ ) p(x^{(i)},z^{(i)};\theta ) , 而由 z Q i ( z ) = 1 \sum_z Q_i(z)=1 可知,我们可以假设
      Q i ( z ( i ) ) = p ( x ( i ) , z ( i ) ; θ ) z p ( x ( i ) , z ; θ ) Q_i(z^{(i)}) = \frac{p(x^{(i)},z^{(i)};\theta )}{\sum _z p(x^{(i)},z;\theta )}
      = p ( x ( i ) , z ( i ) ; θ ) p ( x ( i ) ; θ ) =\frac{p(x^{(i)},z^{(i)};\theta )}{ p(x^{(i)};\theta )}
      = p ( z ( i ) x ( i ) ; θ ) = p(z^{(i)}|x^{(i)};\theta )
      上述就是所谓的EM算法中的E步骤,通过初始化的$\theta$以及假设的后验概率分布 p ( z ( i ) x ( i ) ; θ ) p(z^{(i)}|x^{(i)};\theta ) 求解出$ Q_i(z^{(i)})$,

    • M步骤

      当求解出$ Q_i(z^{(i)}) 0 时,我们可以通过对似然函数求导等于0得到新的 \theta$, 公式如下,
      θ : = a r g m a x θ i z ( i ) Q i ( z ( i ) ) l o g p ( x ( i ) , z ( i ) ; θ ) Q i ( z ( i ) ) \theta := argmax_{\theta } \sum _i\sum _{z^{(i)}}Q_i(z^{(i)})log\frac{p(x^{(i)},z^{(i)};\theta )}{Q_i(z^{(i)})}
      经过最新的 θ \theta ,又能够得到新的后验概率,因而不断迭代,直到收敛,

  • EM算法收敛性证明

    EM算法正确性的证明目标为 l ( θ ( t ) ) l ( θ ( t + 1 ) ) l(\theta^{(t)})\leq l(\theta^{(t+1)}) ,

    根据EM算法,有
    l ( θ ( t ) ) = i z ( i ) Q i ( z ( i ) ) l o g p ( x ( i ) , z ( i ) ; θ ) Q i ( z ( i ) ) l(\theta^{(t)})=\sum _i \sum _{z^{(i)}}Q_i(z^{(i)})log\frac{p(x^{(i)},z^{(i)};\theta )}{Q_i(z^{(i)})}
    而对于$ l(\theta^{(t+1)})$, 有

    l ( θ ( t + 1 ) ) i z ( i ) Q i ( t ) ( z ( i ) ) l o g p ( x ( i ) , z ( i ) ; θ ( t + 1 ) ) Q i ( t ) ( z ( i ) ) l(\theta^{(t+1)}) \geq \sum _i \sum _{z^{(i)}}Q_i^{(t)}(z^{(i)})log\frac{p(x^{(i)},z^{(i)};\theta^{(t+1)} )}{Q_i^{(t)}(z^{(i)})}
    这是由于 l ( θ ) l(\theta) 的本质得来的,

    l ( θ ) i z ( i ) Q i ( z ( i ) ) l o g p ( x ( i ) , z ( i ) ; θ ) Q i ( z ( i ) ) l(\theta)\geq \sum _i \sum _{z^{(i)}}Q_i(z^{(i)})log\frac{p(x^{(i)},z^{(i)};\theta )}{Q_i(z^{(i)})}
    这时 Q i = Q i ( t ) Q_i= Q_i^{(t)} , θ = θ ( t + 1 ) \theta=\theta^{(t+1)} ,

    i z ( i ) Q i ( t ) ( z ( i ) ) l o g p ( x ( i ) , z ( i ) ; θ ( t + 1 ) ) Q i ( t ) ( z ( i ) ) \sum _i \sum _{z^{(i)}}Q_i^{(t)}(z^{(i)})log\frac{p(x^{(i)},z^{(i)};\theta^{(t+1)} )}{Q_i^{(t)}(z^{(i)})}
    i z ( i ) Q i ( t ) ( z ( i ) ) l o g p ( x ( i ) , z ( i ) ; θ ( t ) ) Q i ( t ) ( z ( i ) ) \geq \sum _i \sum _{z^{(i)}}Q_i^{(t)}(z^{(i)})log\frac{p(x^{(i)},z^{(i)};\theta^{(t)} )}{Q_i^{(t)}(z^{(i)})}
    是由于 θ ( t + 1 ) \theta^{(t+1)} 是如下式子的取值,

    a r g m a x θ i z ( i ) Q i ( t ) ( z ( i ) ) l o g p ( x ( i ) , z ( i ) ; θ ) Q i ( t ) ( z ( i ) ) argmax_{\theta}\sum _i \sum _{z^{(i)}}Q_i^{(t)}(z^{(i)})log\frac{p(x^{(i)},z^{(i)};\theta )}{Q_i^{(t)}(z^{(i)})}

    因而总式子为

    l ( θ ( t + 1 ) ) i z ( i ) Q i ( t ) ( z ( i ) ) l o g p ( x ( i ) , z ( i ) ; θ ( t + 1 ) ) Q i ( t ) ( z ( i ) ) l(\theta^{(t+1)}) \geq \sum _i \sum _{z^{(i)}}Q_i^{(t)}(z^{(i)})log\frac{p(x^{(i)},z^{(i)};\theta^{(t+1)} )}{Q_i^{(t)}(z^{(i)})}
    i z ( i ) Q i ( t ) ( z ( i ) ) l o g p ( x ( i ) , z ( i ) ; θ ( t ) ) Q i ( t ) ( z ( i ) ) = l ( θ ( t ) ) \geq \sum _i \sum _{z^{(i)}}Q_i^{(t)}(z^{(i)})log\frac{p(x^{(i)},z^{(i)};\theta^{(t)} )}{Q_i^{(t)}(z^{(i)})}=l(\theta^{(t)})

    因而收敛性得证,EM更新迭代的过程就是 l ( θ ) l(\theta) 单调递增的过程,

    这里值得说明的是,EM算法更像是坐标上升算法,E步骤是对 Q i ( z ) Q_i(z) 进行坐标上升,而M步骤是对 θ \theta 的坐标上升,

发布了36 篇原创文章 · 获赞 42 · 访问量 1万+

猜你喜欢

转载自blog.csdn.net/weixin_37688445/article/details/79265882