EM算法及其应用GMM,pLSA

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/qq_30024069/article/details/88076752

EM(expectation   Maximization)期望最大是一种迭代算法,是一种对包含隐变量的概率模型参数估计的极大似然估计法。第一步期望(E):利用当前参数计算对数似然的期望;第二步最大化(M)步,寻找使E步产生的对数似然期望最大化的参数值。迭代使用EM步直到收敛。

提纲挈领:隐变量,极大似然估计

假设训练数据集{x^1,x^2.....x^m}包含m个独立样本,无样本标签,我们希望得到模型p(y|x)。开始下面工作:

1.似然函数

对于每个样本i,设Qi关于zj的分布,Qi(zj)

\\ L(\theta)=\prod_{i=1}^{m} p(x_i;\theta)\\ l(\theta)=\sum_{i=1}^{m} logp(x_i;\theta)\\ =\sum_{i=1}^{m} log\sum_{j=1}^{k}p(x_i,z_j;\theta)\\ =\sum_{i=1}^{m}log \sum_{j=1}^{k}p(x_i,z_j;\theta)\\ =\sum_{i=1}^{m}log \sum_{j=1}^{k}p(x_i,z_j;\theta)\\ =\sum_{i=1}^{m}log \sum_{j=1}^{k} Q_i(z_j;\theta) \frac{p(x_i,z_j;\theta)}{Q_i(z_j;\theta)}\\ \geq \sum_{i=1}^{m} \sum_{j=1}^{k}Q_i(z_j;\theta) log \frac{p(x_i,z_j;\theta)}{Q_i(z_j;\theta)} \ \ \ \ \ \ (1) 

最后一步采用Jensen不等式,log函数为上凸函数,f(E(x))>=E(f(x)),等式成立条件为f(x)=C,即

\frac{p(x_i,z_j;\theta)}{Q_i(z_j;\theta)}=C\\ \\ s.t \sum Q_i(z_j;\theta)=1\\ \\ p(x_i,z_j;\theta) \propto Q_i(z_j;\theta)

归一化得到:

\\ Q_i(z_j;\theta)=\frac{p(x_i,z_j;\theta)}{\sum_{j=1}^{k} p(x_i,z_j;\theta)}\\ \ \ \ =\frac{p(x_i,z_j;\theta)}{p(x_i;\theta)}\\ =p(z_j|x_i;\theta)

Qi是一个后验概率,由xi和参数可以得到。M步:计算公式1的最大似然函数,得到参数的估计。

\theta:= arg\ max_{\theta}\ \sum_{i=1}^{m} \sum_{j=1}^{k}Q_i(z_j;\theta) log \frac{p(x_i,z_j;\theta)}{Q_i(z_j;\theta)}

由上可得出EM算法:

repeat  until convergence{

E-step:for each i,set

             Q_i(z_j):=p(z_j|x_i;\theta)

M_step set

            \theta:= arg\ max_{\theta}\ \sum_{i=1}^{m} \sum_{j=1}^{k}Q_i(z_j;\theta) log \frac{p(x_i,z_j;\theta)}{Q_i(z_j;\theta)}

}

在推导高斯混合聚类算法时,在向量求导处卡顿半边,现将详细的推导过程写出。

对n维样本空间X中随机变量x,若x服从高斯分布,其概率密度函数为:

p(x)=\frac{1}{(2\pi)^{\frac{n}{2}}\begin{vmatrix}\Sigma\end{vmatrix}^{\frac{1}{2}}}e^{-\frac{1}{2}(x-u)^{T}\Sigma^{-1}(x-u)}

其中u是n维均值向量,  是n*n的协方差矩阵。为了明确显示高斯分布与相应参数的依赖关系,将概率密度函数记为p(x|u,\Sigma)

高斯混合分布

p_{m} (x)= \sum_{i=1}^{k}a_{i}\cdot p(x|u_{i},\Sigma_{i})

该分布由k个混合分布组成,每个混合成分对应一个高斯分布,其中u_{i}\Sigma_{i} 分别为第i个高斯混合成分的参数,a_{i}>0为相应的混合系数,且\tiny \sum_{i=1}^{k}a_{i}}=1

令随机变量z_{j}\in\left \{1,2,3...k \right \}表示生成样本xj,其值未知。显然zj的先验概率p(z_{j}=i)

p_{m}(z_{j}=i|x_{j})=\frac{p_{m}(z_{j}=i,x_{j})}{p_{m}(x_{j})} =\frac{p_{m}(z_{j}=i)\cdot p_{m}(x_{j}|z_{j}=i)}{p_{m}(x_{j})} =\frac{a_{i}\cdot p_{m}(x_{j}|,u_{i},\Sigma_{i})}{ \sum_{i=1}^{k}a_{i}\cdot p(x|u_{i},\Sigma_{i})}

换言之,p_{m}(z_{j}=i|x_{j})是给出样本xj由第j个高斯混合成分生成的后验概率,记为r_{ji}(i=1,2,3...k)

当高斯混合分布已知,高斯混合聚类将样本划分为k个簇C=\left \{ C_1, C_2 ........C_k \right \},每个样本xj的簇标记

\lambda _{j}=\underset{i\in\left \{ 1,2...k \right \}}{argmax} r_{ji}

对于给定样本集D,采用对数极大似然估计法:

LL(D)=ln\prod_{j=1}^{m}p_{m}(x_{j}) =\sum_{j=1}^{m}ln(\sum_{i=1}^{k}a_{i}\cdot p(x|u_{i},\Sigma_{i})))

若参数  使对数极大似然最大化,则对其求导。

\frac{\partial LL(D)}{\partial u_{i}}=\sum_{j=1}^{m}\frac{a_{i}\cdot\frac{\partial p(x|u_{i},\Sigma_{i})}{\partial u_{i}}}{\sum_{i=1}^{k}a_{i}\cdot p(x|u_{i},\Sigma_{i}))}

\frac{\partial p(x|u_{i},\Sigma_{i})}{\partial u_{i}}\frac{\partial p(x|u_{i},\Sigma_{i})}{\partial u_{i}}=\frac{1}{(2\pi)^{\frac{n}{2}}\begin{vmatrix}\Sigma\end{vmatrix}^{\frac{1}{2}}}e^{-\frac{1}{2}(x-u)^{T}\Sigma^{-1}(x-u)} \frac{\partial -\frac{1}{2}(x-u)^{T}\Sigma^{-1}(x-u)}{\partial u_{i}}=p(x|u_{i},\Sigma_{i}) \frac{\partial -\frac{1}{2}(x-u)^{T}\Sigma^{-1}(x-u)}{\partial u_{i}}

\varepsilon  

\frac{\partial u_{i}^T\Sigma^{-1}x )}{\partial u_{i}}=\Sigma^{-1}x

\frac{\partial x^T\Sigma^{-1}u}{\partial u}=(x_{i}^T\Sigma^{-1})^{T}=\Sigma^{-1}x

\frac{\partial u_{i}^T\Sigma^{-1}u_{i})}{\partial u_{i}}=\Sigma^{-1}u_{i}+(u_{i}^T\Sigma^{-1})^{T}=2\Sigma^{-1}u_{i}

\frac{\partial -\frac{1}{2}(x-u)^{T}\Sigma^{-1}(x-u)}{\partial u_{i}}= \frac{\partial -\frac{1}{2}(x^T\Sigma^{-1}x-u^T\Sigma^{-1}x - x^T\Sigma^{-1}u+u^T\Sigma^{-1}u)}{\partial u_{i}}=\Sigma^{-1}(x_{j}-u_{i})

\frac{\partial LL(D)}{\partial u_{i}}=\sum_{j=1}^{m}\frac{a_{i}\cdot\frac{\partial p(x|u_{i},\Sigma_{i})}{\partial u_{i}}}{\sum_{i=1}^{k}a_{i}\cdot p(x|u_{i},\Sigma_{i}))}=\sum_{j=1}^{m}\frac{a_{i}\cdot{p(x|u_{i},\Sigma_{i})\Sigma^{-1}(x_{j}-u_{i})}}{\sum_{i=1}^{k}a_{i}\cdot p(x|u_{i},\Sigma_{i}))} =0\sum_{j=1}^{m}\frac{a_{i}\cdot{p(x|u_{i},\Sigma_{i})\Sigma^{-1}(x_{j}-u_{i})}}{\sum_{i=1}^{k}a_{i}\cdot p(x|u_{i},\Sigma_{i}))} = \sum_{j=1}^{m}\frac{r_{ji}\Sigma^{-1}(x_{j}-u_{i})}{\sum_{i=1}^{k}r_{ji}}=0

u_{i}=\frac{\sum_{j=1}^{m}r_{ji}x_{j}}{\sum_{j=1}^{m}r_{ji}}

猜你喜欢

转载自blog.csdn.net/qq_30024069/article/details/88076752
今日推荐