EM算法及其扩展

版权声明:本文为原创,转载后劳烦留言告知 https://blog.csdn.net/yc_1993/article/details/52590822

Preface
题目中的各大算法前人早已总结得很完善,如JerryLeadabcjenniferpluskid 等各路前辈,开此题的目的就是斗胆在前人的基础上,画蛇添足,把自己工作中的一点研究和经验分享出来,与大家一同探讨进步,多多支持。

EM 部分
一、EM的历史
  (一)、大话EM
  (二)、EM核心
二、EM的推导
  (一)、误差下界导出EM
  (二)、EM迭代收敛证明
  (三)、Q函数的分布推广
三、EM的推广
  (一)、GEM
  (二)、ECM
四、EM的一些参考资料

GMM 部分
一、一维GMM
  (一)、理论推导
  (二)、实现部分
     1、数据模拟(R)
     2、GMM的实现(scala)
二、多维GMM
  (一)、理论推导
  (二)、实现部分
     1、数据模拟(R)
     2、GMMs的实现(scala)
       (1)、初始值模拟
       (2)、行列式计算
       (3)、矩阵逆计算
       (4)、GMMs综合

一、EM的历史

EM的历史就是一部血泪史,难怪MLAPP一书的作者Murphy会说:
“This is an unfortunate term, since there are many ways to generalize EM.... .”

(一)、大话EM

EM历史
                                                                        图1 Dempster, Laird, Rubin(EM算法的重要引领者)
    EM(Expectation Maximization)算法的发展有一个分水岭–1977,那一年,Dempster, Laird, Rubin(上图1)三个一起发表了一篇论文:Maximum Likelihood from Incomplete Data via the EM Algorithm。 自此,EM算法结束了一种零零散散状态,以一种严谨的数学表达及证明进入人们的视野。
     其实,在DLR(Dempster, Laird, Rubin)之前,一些MLE(maximum likelihood estimation)问题,早已有人用到EM的一些思想去解决,但是都是比较特定的领域,直到DLR总览之后的总结,EM才大放光彩,那么为什么说EM的历史是一部血泪史,故事还得从DLR之后继续说,当DLR首次为EM正名之后,EM中的Q函数,M步是一直是很多人“诟病”的焦点,大家关注较多的通常有两个问题,一是Q函数的复杂度,二是M步的优化效率,然后就此展开了一场“厮杀”,有人觉得EM似乎可以从另外一个角度来解释,于是有了GEM框架(后面会提及),其觉得不必每次都最大化M步,找到一个比之前似然值更大的即可,有点梯度下降迭代的影子。有人觉得GEM的想法可取,但是高维下M步的迭代速度还是有点慢,于是有了ECM(后面会提及)对EM的扩展,不久,又有个哥们觉得ECM的改进不过瘾,进而修改了其中CM步骤的优化问题,提出了ECME算法,然后ECME在提出一段时间后,又被人整合到了新的框架,AECM算法中,等等,后面类似这样的事还有很多,今天提出的观点,明天就有人在你的基础上改进,一场算法竞技,血与泪并存,正是因为他们的不懈,推动着历史的滚轮大步向前。


待整理…

/**
  * Created by yangcheng on 2016/9/10.
  * This code is about gaussian mixture model(GMM).
  * Hope it well done in our work.
  */
object GMM {
    def main(args: Array[String]): Unit = {
        import scala.io.Source._
        import scala.Array._
        import scala.collection.mutable
        import scala.util.Random

        val data = fromFile("C:/Users/yangcheng/Desktop/rnormData.txt").getLines().toArray
        //data.foreach(println)

        // k indicates that it has k kinds of gaussian model
        val k: Int = 4
        // initial the parameter
        val alpha: Array[Double] = new Array[Double](k).map(_ + (1.0 / k))
        val theta = new mutable.HashMap[Int, Map[String, Double]]()
        //Random.setSeed(1000)
        for (i <- 0 until k) {
            theta.put(i, Map("mean" -> Random.nextInt(100), "var" -> Random.nextInt(100)))
        }
        //theta.foreach(println)

        // training
        var len: Double = 1.0
        var j: Int = 0
        while (len > math.pow(10, -1)) {
            j += 1
            val oldTheta = theta.clone()
            // calculate the latent variable (E step)
            val gamma = data.map(a => {
                val rs = new Array[Double](k)
                for (i <- 0 until k) {
                    rs(i) = alpha(i) * (1 / math.sqrt(2 * math.Pi * theta(i)("var")) *
                      math.exp(-math.pow(a.toDouble - theta(i)("mean"), 2) / 2 / theta(i)("var")))
                }
                rs.map(_ / rs.sum)
            })
            //gamma.foreach(a => println(a.mkString(",")))

            // maximum the Q fun (M step)
            for (i <- 0 until k) {
                // update the sd of gaussian
                val sdjk = gamma.map(_ (i)).zip(data.map(a => math.pow(a.toDouble - theta(i)("mean"), 2))).map(a => a._1 * a._2).sum
                theta(i) += ("var" -> sdjk / gamma.map(_ (i)).sum)
                // update the mean of gaussian
                val meanjk = gamma.map(_ (i)).zip(data).map(a => a._1 * a._2.toDouble).sum
                theta(i) += ("mean" -> meanjk / gamma.map(_ (i)).sum)
                // update the alpha
                alpha(i) = gamma.map(_ (i)).sum / data.length
            }

            // judge the length of update step is enough or not
            len = 0.0
            for (i <- 0 until k) {
                len += (theta(i)("mean") - oldTheta(i)("mean")).abs
                len += (theta(i)("var") - oldTheta(i)("var")).abs
            }
            //oldTheta.foreach(a => println("oldTheta = " + a))
        }

        // print the result
        for(i <- 0 until k) println("alpha" + i + " = " + alpha(i))
        theta.foreach(a => println("theta = " + a))
        println("j = " + j)
    }
}

/**
  * Created by yangcheng on 2016/9/12.
  * This code is about Multidimensional gaussian mixture model(GMMs).
  * it more general in real world.
  * Hope it well done in work.
  */
object GMMs {
    def main(args: Array[String]): Unit = {
        import scala.io.Source._
        import scala.Array._
        import scala.collection.mutable
        import scala.util.Random
        import scala.annotation.tailrec

        // load the data
        val mvdata = fromFile("C:/Users/yangcheng/Desktop/mvrnormData.txt").getLines().toArray
        //mvdata.map(_.split(" ")).foreach(a => println(a.mkString("[", ", ", "]")))

        // initial the parameter
        // k indicates that it has k kinds of gaussian model
        // dim indicates that those gaussian model have dim dimensions
        val dim: Int = 2
        val k: Int = 2
        val alpha: Array[Double] = new Array[Double](k).map(_ + 1.0 / k)
        val mvMu = mutable.HashMap[Int, Array[Double]]()
        val mvSigma = mutable.HashMap[Int, Array[Array[Double]]]()
        val Imatrix = mutable.HashMap[Int, Array[Array[Double]]]()
        // Generate random mean vector and covariance matrix (positive definite matrix, Mr.Hurwitz theorem)
        for (i <- 0 until k) {
            // for a mean vector
            mvMu.put(i, new Array[Double](dim).map(_ + Random.nextInt(100)))
            // for a cov matrix and identity matrix
            mvSigma.put(i, ofDim[Double](dim, dim))
            Imatrix.put(i, ofDim[Double](dim, dim))
            for (j <- 0 until dim) {
                mvSigma(i)(j)(j) = (Random.nextInt(10) + 1) * 10
                Imatrix(i)(j)(j) = 1
                for (s <- j + 1 until dim) {
                    mvSigma(i)(j)(s) = Random.nextGaussian()
                    mvSigma(i)(s)(j) = mvSigma(i)(j)(s)
                    Imatrix(i)(j)(s) = 0
                    Imatrix(i)(s)(j) = Imatrix(i)(j)(s)
                }
            }
        }
        //mvdata.map(_.split(" ")).map(a => a.map(_.toDouble)).map(b => b.zip(mvMu(0)).map(c => c._1 - c._2)).foreach(a => println(a.mkString("\t")))
        //alpha.foreach(println)
        mvMu.foreach(a => println("mvMu = " + a))
        mvMu(0).foreach(a => println("mvMu0 = " + a))
        //mvSigma.foreach(a => println("mvSigma = " + a))
        //mvSigma.foreach(a => println(a._2.mkString("\t")))
        //mvSigma(0).foreach(a => println(a.mkString("\t\t")))
        //Imatrix.foreach(a => println(a._2.mkString("\t")))
        //Imatrix(0).foreach(a => println(a.mkString("\t\t")))

        // define the function of Extension matrix adjustment, make sure there are haven't zero row or column
        //def adjust(M: Array[Array[Double]]): Array[Array[Double]] = {
        //    val allRowSum = M.reduceLeft(_.zip(_).map(a => 2 * a._1 + 2 * a._2))
        //    M.map(a => a.zip(allRowSum).map(b => b._1 + b._2))
        //}
        //adjust(mvSigma(0)).foreach(a => println("adjust mvSigma = " + a.mkString("\t")))

        // Define the calculation of the inverse of a matrix, with a elementary transformation
        // Extension matrix elementary transformation
        def calculateInverse(Sigma: Array[Array[Double]], E: Array[Array[Double]]): Array[Array[Double]] = {
            val l = Sigma.length
            // combine Sigma with E, form the Extension Matrix
            val ExtensionMatrix = ofDim[Double](l, 2 * l)
            for (i <- 0 until l) {
                ExtensionMatrix(i) = Sigma(i).++(E(i))
            }
            //ExtensionMatrix.foreach(a => println("ExtensionMatrix = " + a.mkString("\t")))
            // elementary transformation
            @tailrec
            def elementaryTransform(eM: Array[Array[Double]], n: Int): Array[Array[Double]] = {
                if (n >= l) {
                    for (j <- (1 until l).reverse) {
                        for (k <- 0 until j) {
                            eM(k) = eM(k).zip(eM(j)).map(a => a._1 - eM(k)(j) * a._2)
                        }
                    }
                    eM
                } else {
                    eM(n) = eM(n).map(_ / eM(n)(n))
                    for (i <- n until l - 1) {
                        eM(i + 1) = eM(i + 1).map(_ / eM(i + 1)(n))
                        eM(i + 1) = eM(i + 1).zip(eM(n)).map(a => a._1 - a._2)
                    }
                    elementaryTransform(eM, n + 1)
                }
            }
            // take out the identity matrix
            elementaryTransform(ExtensionMatrix, 0).map(a => {
                val gg = new Array[Double](l)
                for (i <- 0 until l) {
                    gg(i) = a(i + l)
                }
                gg
            })
        }
        // print the output
        calculateInverse(Sigma = mvSigma(0), E = Imatrix(0)).foreach(a => println("Inverse of Sigma = " + a.mkString("\t")))

        // Define the calculation of the determinant of a matrix
        // turn M into Upper triangular matrix
        @tailrec
        def calculateDeteminant(M: Array[Array[Double]], n: Int): Double = {
            if (n >= M.length - 1) {
                var Deteminant: Double = 1.0
                for (i <- M.indices) {
                    Deteminant = Deteminant * M(i)(i)
                }
                Deteminant
            } else {
                for (i <- n until M.length - 1) {
                    M(i + 1) = M(i + 1).zip(M(n)).map(a => a._1 - a._2 * M(i + 1)(n) / M(n)(n))
                }
                calculateDeteminant(M, n + 1)
            }
        }
        //println(calculateDeteminant(mvSigma(0),0))


        // def the matrix multiplication function
        def matrixMultiplicationLeft(left: Array[Double], middle: Array[Array[Double]]): Array[Double] = {
            val col = middle(0).length
            val leftResult = new Array[Double](col)
            for (j <- 0 until col) {
                leftResult(j) = left.zip(middle.map(_ (j))).map(a => a._1 * a._2).sum
            }
            leftResult
        }
        //matrixMultiplicationLeft(alpha,mvSigma(0)).foreach(println)

        // training
        var len: Double = 1.0
        var j: Int = 0
        while (len > math.pow(10, -1)) {
            j += 1
            val oldmvMu = mvMu.clone()
            val oldmvSigma = mvSigma.clone()
            // calculate the latent variable (E step)
            val gamma = mvdata.map(a => {
                val rs = new Array[Double](k)
                for (i <- 0 until k) {
                    val dx = a.split(" ").zip(mvMu(i)).map(b => b._1.toDouble - b._2) //.toArray
                    val temp = matrixMultiplicationLeft(dx, calculateInverse(mvSigma(i), Imatrix(i))).zip(dx).map(a => a._1 * a._2).sum
                    val ISigma = mvSigma(i).clone()
                    rs(i) = (1.0 / math.pow(2 * math.Pi, dim / 2)) * (1.0 / math.pow(calculateDeteminant(ISigma, 0).abs, 0.5)) * math.exp(-0.5 * temp) * alpha(i)
                }
                rs.map(_ / rs.sum)
            })
            //gamma.foreach(a => println(a.mkString(",")))
            //val dx = mvdata(0).split(" ").zip(mvMu(0)).map(b => b._1.toDouble - b._2)
            //println("dx = " + dx.mkString("\t"))
            //val temp00 = calculateInverse(Sigma = mvSigma(0), E = Imatrix(0))
            //val temp11 = matrixMultiplicationLeft(dx, calculateInverse(mvSigma(0), Imatrix(0))).zip(dx).map(a => a._1 * a._2).sum
            //println("temp0 = " + temp00(0).mkString("\t"))

            // maximum the Q fun (M step)
            for (i <- 0 until k) {
                // update the sigma of the multidimensional gaussian model
                val iSigma = ofDim[Double](dim, dim)
                val dx2 = mvdata.map(a => {
                    a.split(" ").zip(mvMu(i)).map(b => b._1.toDouble - b._2)
                })
                for (n <- 0 until dim) {
                    for (m <- 0 until dim) {
                        iSigma(n)(m) = gamma.map(_ (i)).zip(dx2.map(_ (n)).zip(dx2.map(_ (m))).map(a => a._1.toDouble * a._2)).map(b => b._1 * b._2).sum /
                          gamma.map(_ (i)).sum
                    }
                }
                mvSigma.put(i, iSigma)
                // update the mean of the multidimensional gaussian model
                val iMu = new Array[Double](dim)
                for (j <- 0 until dim) {
                    iMu(j) = gamma.map(_ (i)).zip(mvdata.map(_.split(" ")).map(_ (j))).map(a => a._1 * a._2.toDouble).sum / gamma.map(_ (i)).sum
                }
                mvMu.put(i, iMu)
                // update the alpha
                alpha(i) = gamma.map(_ (i)).sum / mvdata.length
            }

            // judge the length of update step is enough or not
            len = 0.0
            for (i <- 0 until k) {
                len += mvMu(i).zip(oldmvMu(i)).map(a => (a._1 - a._2).abs).sum
                for (j <- 0 until dim) {
                    len += mvSigma(i)(j).zip(oldmvSigma(i)(j)).map(a => (a._1 - a._2).abs).sum
                }
            }
        }
        alpha.foreach(println)
        mvMu.foreach(a => println("mvMu = " + a._2.mkString("\t")))
        mvSigma.foreach(a => println("mvSigma = " + a._2.mkString("\t")))
        mvSigma(0).foreach(a => println("mvSigma0 = " + a.mkString("\t")))
        mvSigma(1).foreach(a => println("mvSigma0 = " + a.mkString("\t")))
        println(j)

    }
}

猜你喜欢

转载自blog.csdn.net/yc_1993/article/details/52590822