Preface
题目中的各大算法前人早已总结得很完善,如JerryLead,abcjennifer,pluskid 等各路前辈,开此题的目的就是斗胆在前人的基础上,画蛇添足,把自己工作中的一点研究和经验分享出来,与大家一同探讨进步,多多支持。
EM 部分
一、EM的历史
(一)、大话EM
(二)、EM核心
二、EM的推导
(一)、误差下界导出EM
(二)、EM迭代收敛证明
(三)、Q函数的分布推广
三、EM的推广
(一)、GEM
(二)、ECM
四、EM的一些参考资料
GMM 部分
一、一维GMM
(一)、理论推导
(二)、实现部分
1、数据模拟(R)
2、GMM的实现(scala)
二、多维GMM
(一)、理论推导
(二)、实现部分
1、数据模拟(R)
2、GMMs的实现(scala)
(1)、初始值模拟
(2)、行列式计算
(3)、矩阵逆计算
(4)、GMMs综合
一、EM的历史
EM的历史就是一部血泪史,难怪MLAPP一书的作者Murphy会说:
“This is an unfortunate term, since there are many ways to generalize EM.... .”
(一)、大话EM
图1 Dempster, Laird, Rubin(EM算法的重要引领者)
EM(Expectation Maximization)算法的发展有一个分水岭–1977,那一年,Dempster, Laird, Rubin(上图1)三个一起发表了一篇论文:Maximum Likelihood from Incomplete Data via the EM Algorithm。 自此,EM算法结束了一种零零散散状态,以一种严谨的数学表达及证明进入人们的视野。
其实,在DLR(Dempster, Laird, Rubin)之前,一些MLE(maximum likelihood estimation)问题,早已有人用到EM的一些思想去解决,但是都是比较特定的领域,直到DLR总览之后的总结,EM才大放光彩,那么为什么说EM的历史是一部血泪史,故事还得从DLR之后继续说,当DLR首次为EM正名之后,EM中的Q函数,M步是一直是很多人“诟病”的焦点,大家关注较多的通常有两个问题,一是Q函数的复杂度,二是M步的优化效率,然后就此展开了一场“厮杀”,有人觉得EM似乎可以从另外一个角度来解释,于是有了GEM框架(后面会提及),其觉得不必每次都最大化M步,找到一个比之前似然值更大的即可,有点梯度下降迭代的影子。有人觉得GEM的想法可取,但是高维下M步的迭代速度还是有点慢,于是有了ECM(后面会提及)对EM的扩展,不久,又有个哥们觉得ECM的改进不过瘾,进而修改了其中CM步骤的优化问题,提出了ECME算法,然后ECME在提出一段时间后,又被人整合到了新的框架,AECM算法中,等等,后面类似这样的事还有很多,今天提出的观点,明天就有人在你的基础上改进,一场算法竞技,血与泪并存,正是因为他们的不懈,推动着历史的滚轮大步向前。
待整理…
/**
* Created by yangcheng on 2016/9/10.
* This code is about gaussian mixture model(GMM).
* Hope it well done in our work.
*/
object GMM {
def main(args: Array[String]): Unit = {
import scala.io.Source._
import scala.Array._
import scala.collection.mutable
import scala.util.Random
val data = fromFile("C:/Users/yangcheng/Desktop/rnormData.txt").getLines().toArray
//data.foreach(println)
// k indicates that it has k kinds of gaussian model
val k: Int = 4
// initial the parameter
val alpha: Array[Double] = new Array[Double](k).map(_ + (1.0 / k))
val theta = new mutable.HashMap[Int, Map[String, Double]]()
//Random.setSeed(1000)
for (i <- 0 until k) {
theta.put(i, Map("mean" -> Random.nextInt(100), "var" -> Random.nextInt(100)))
}
//theta.foreach(println)
// training
var len: Double = 1.0
var j: Int = 0
while (len > math.pow(10, -1)) {
j += 1
val oldTheta = theta.clone()
// calculate the latent variable (E step)
val gamma = data.map(a => {
val rs = new Array[Double](k)
for (i <- 0 until k) {
rs(i) = alpha(i) * (1 / math.sqrt(2 * math.Pi * theta(i)("var")) *
math.exp(-math.pow(a.toDouble - theta(i)("mean"), 2) / 2 / theta(i)("var")))
}
rs.map(_ / rs.sum)
})
//gamma.foreach(a => println(a.mkString(",")))
// maximum the Q fun (M step)
for (i <- 0 until k) {
// update the sd of gaussian
val sdjk = gamma.map(_ (i)).zip(data.map(a => math.pow(a.toDouble - theta(i)("mean"), 2))).map(a => a._1 * a._2).sum
theta(i) += ("var" -> sdjk / gamma.map(_ (i)).sum)
// update the mean of gaussian
val meanjk = gamma.map(_ (i)).zip(data).map(a => a._1 * a._2.toDouble).sum
theta(i) += ("mean" -> meanjk / gamma.map(_ (i)).sum)
// update the alpha
alpha(i) = gamma.map(_ (i)).sum / data.length
}
// judge the length of update step is enough or not
len = 0.0
for (i <- 0 until k) {
len += (theta(i)("mean") - oldTheta(i)("mean")).abs
len += (theta(i)("var") - oldTheta(i)("var")).abs
}
//oldTheta.foreach(a => println("oldTheta = " + a))
}
// print the result
for(i <- 0 until k) println("alpha" + i + " = " + alpha(i))
theta.foreach(a => println("theta = " + a))
println("j = " + j)
}
}
/**
* Created by yangcheng on 2016/9/12.
* This code is about Multidimensional gaussian mixture model(GMMs).
* it more general in real world.
* Hope it well done in work.
*/
object GMMs {
def main(args: Array[String]): Unit = {
import scala.io.Source._
import scala.Array._
import scala.collection.mutable
import scala.util.Random
import scala.annotation.tailrec
// load the data
val mvdata = fromFile("C:/Users/yangcheng/Desktop/mvrnormData.txt").getLines().toArray
//mvdata.map(_.split(" ")).foreach(a => println(a.mkString("[", ", ", "]")))
// initial the parameter
// k indicates that it has k kinds of gaussian model
// dim indicates that those gaussian model have dim dimensions
val dim: Int = 2
val k: Int = 2
val alpha: Array[Double] = new Array[Double](k).map(_ + 1.0 / k)
val mvMu = mutable.HashMap[Int, Array[Double]]()
val mvSigma = mutable.HashMap[Int, Array[Array[Double]]]()
val Imatrix = mutable.HashMap[Int, Array[Array[Double]]]()
// Generate random mean vector and covariance matrix (positive definite matrix, Mr.Hurwitz theorem)
for (i <- 0 until k) {
// for a mean vector
mvMu.put(i, new Array[Double](dim).map(_ + Random.nextInt(100)))
// for a cov matrix and identity matrix
mvSigma.put(i, ofDim[Double](dim, dim))
Imatrix.put(i, ofDim[Double](dim, dim))
for (j <- 0 until dim) {
mvSigma(i)(j)(j) = (Random.nextInt(10) + 1) * 10
Imatrix(i)(j)(j) = 1
for (s <- j + 1 until dim) {
mvSigma(i)(j)(s) = Random.nextGaussian()
mvSigma(i)(s)(j) = mvSigma(i)(j)(s)
Imatrix(i)(j)(s) = 0
Imatrix(i)(s)(j) = Imatrix(i)(j)(s)
}
}
}
//mvdata.map(_.split(" ")).map(a => a.map(_.toDouble)).map(b => b.zip(mvMu(0)).map(c => c._1 - c._2)).foreach(a => println(a.mkString("\t")))
//alpha.foreach(println)
mvMu.foreach(a => println("mvMu = " + a))
mvMu(0).foreach(a => println("mvMu0 = " + a))
//mvSigma.foreach(a => println("mvSigma = " + a))
//mvSigma.foreach(a => println(a._2.mkString("\t")))
//mvSigma(0).foreach(a => println(a.mkString("\t\t")))
//Imatrix.foreach(a => println(a._2.mkString("\t")))
//Imatrix(0).foreach(a => println(a.mkString("\t\t")))
// define the function of Extension matrix adjustment, make sure there are haven't zero row or column
//def adjust(M: Array[Array[Double]]): Array[Array[Double]] = {
// val allRowSum = M.reduceLeft(_.zip(_).map(a => 2 * a._1 + 2 * a._2))
// M.map(a => a.zip(allRowSum).map(b => b._1 + b._2))
//}
//adjust(mvSigma(0)).foreach(a => println("adjust mvSigma = " + a.mkString("\t")))
// Define the calculation of the inverse of a matrix, with a elementary transformation
// Extension matrix elementary transformation
def calculateInverse(Sigma: Array[Array[Double]], E: Array[Array[Double]]): Array[Array[Double]] = {
val l = Sigma.length
// combine Sigma with E, form the Extension Matrix
val ExtensionMatrix = ofDim[Double](l, 2 * l)
for (i <- 0 until l) {
ExtensionMatrix(i) = Sigma(i).++(E(i))
}
//ExtensionMatrix.foreach(a => println("ExtensionMatrix = " + a.mkString("\t")))
// elementary transformation
@tailrec
def elementaryTransform(eM: Array[Array[Double]], n: Int): Array[Array[Double]] = {
if (n >= l) {
for (j <- (1 until l).reverse) {
for (k <- 0 until j) {
eM(k) = eM(k).zip(eM(j)).map(a => a._1 - eM(k)(j) * a._2)
}
}
eM
} else {
eM(n) = eM(n).map(_ / eM(n)(n))
for (i <- n until l - 1) {
eM(i + 1) = eM(i + 1).map(_ / eM(i + 1)(n))
eM(i + 1) = eM(i + 1).zip(eM(n)).map(a => a._1 - a._2)
}
elementaryTransform(eM, n + 1)
}
}
// take out the identity matrix
elementaryTransform(ExtensionMatrix, 0).map(a => {
val gg = new Array[Double](l)
for (i <- 0 until l) {
gg(i) = a(i + l)
}
gg
})
}
// print the output
calculateInverse(Sigma = mvSigma(0), E = Imatrix(0)).foreach(a => println("Inverse of Sigma = " + a.mkString("\t")))
// Define the calculation of the determinant of a matrix
// turn M into Upper triangular matrix
@tailrec
def calculateDeteminant(M: Array[Array[Double]], n: Int): Double = {
if (n >= M.length - 1) {
var Deteminant: Double = 1.0
for (i <- M.indices) {
Deteminant = Deteminant * M(i)(i)
}
Deteminant
} else {
for (i <- n until M.length - 1) {
M(i + 1) = M(i + 1).zip(M(n)).map(a => a._1 - a._2 * M(i + 1)(n) / M(n)(n))
}
calculateDeteminant(M, n + 1)
}
}
//println(calculateDeteminant(mvSigma(0),0))
// def the matrix multiplication function
def matrixMultiplicationLeft(left: Array[Double], middle: Array[Array[Double]]): Array[Double] = {
val col = middle(0).length
val leftResult = new Array[Double](col)
for (j <- 0 until col) {
leftResult(j) = left.zip(middle.map(_ (j))).map(a => a._1 * a._2).sum
}
leftResult
}
//matrixMultiplicationLeft(alpha,mvSigma(0)).foreach(println)
// training
var len: Double = 1.0
var j: Int = 0
while (len > math.pow(10, -1)) {
j += 1
val oldmvMu = mvMu.clone()
val oldmvSigma = mvSigma.clone()
// calculate the latent variable (E step)
val gamma = mvdata.map(a => {
val rs = new Array[Double](k)
for (i <- 0 until k) {
val dx = a.split(" ").zip(mvMu(i)).map(b => b._1.toDouble - b._2) //.toArray
val temp = matrixMultiplicationLeft(dx, calculateInverse(mvSigma(i), Imatrix(i))).zip(dx).map(a => a._1 * a._2).sum
val ISigma = mvSigma(i).clone()
rs(i) = (1.0 / math.pow(2 * math.Pi, dim / 2)) * (1.0 / math.pow(calculateDeteminant(ISigma, 0).abs, 0.5)) * math.exp(-0.5 * temp) * alpha(i)
}
rs.map(_ / rs.sum)
})
//gamma.foreach(a => println(a.mkString(",")))
//val dx = mvdata(0).split(" ").zip(mvMu(0)).map(b => b._1.toDouble - b._2)
//println("dx = " + dx.mkString("\t"))
//val temp00 = calculateInverse(Sigma = mvSigma(0), E = Imatrix(0))
//val temp11 = matrixMultiplicationLeft(dx, calculateInverse(mvSigma(0), Imatrix(0))).zip(dx).map(a => a._1 * a._2).sum
//println("temp0 = " + temp00(0).mkString("\t"))
// maximum the Q fun (M step)
for (i <- 0 until k) {
// update the sigma of the multidimensional gaussian model
val iSigma = ofDim[Double](dim, dim)
val dx2 = mvdata.map(a => {
a.split(" ").zip(mvMu(i)).map(b => b._1.toDouble - b._2)
})
for (n <- 0 until dim) {
for (m <- 0 until dim) {
iSigma(n)(m) = gamma.map(_ (i)).zip(dx2.map(_ (n)).zip(dx2.map(_ (m))).map(a => a._1.toDouble * a._2)).map(b => b._1 * b._2).sum /
gamma.map(_ (i)).sum
}
}
mvSigma.put(i, iSigma)
// update the mean of the multidimensional gaussian model
val iMu = new Array[Double](dim)
for (j <- 0 until dim) {
iMu(j) = gamma.map(_ (i)).zip(mvdata.map(_.split(" ")).map(_ (j))).map(a => a._1 * a._2.toDouble).sum / gamma.map(_ (i)).sum
}
mvMu.put(i, iMu)
// update the alpha
alpha(i) = gamma.map(_ (i)).sum / mvdata.length
}
// judge the length of update step is enough or not
len = 0.0
for (i <- 0 until k) {
len += mvMu(i).zip(oldmvMu(i)).map(a => (a._1 - a._2).abs).sum
for (j <- 0 until dim) {
len += mvSigma(i)(j).zip(oldmvSigma(i)(j)).map(a => (a._1 - a._2).abs).sum
}
}
}
alpha.foreach(println)
mvMu.foreach(a => println("mvMu = " + a._2.mkString("\t")))
mvSigma.foreach(a => println("mvSigma = " + a._2.mkString("\t")))
mvSigma(0).foreach(a => println("mvSigma0 = " + a.mkString("\t")))
mvSigma(1).foreach(a => println("mvSigma0 = " + a.mkString("\t")))
println(j)
}
}