《统计学习方法》——朴素贝叶斯参数的极大似然估计

参考资料:

Step1. 似然函数

在朴素贝叶斯模型中,我们需要通过训练集确定的参数为 θ k = P ( y = c k ) \theta_k=P(y=c_k) θk=P(y=ck) μ j l k = P ( x ( j ) = a j l ∣ y = c k ) \mu_{jlk}=P(x^{(j)}=a_{jl}|y=c_k) μjlk=P(x(j)=ajly=ck)

似然函数:
L ( θ , μ ) = ∏ i = 1 N P ( x i , y i ) = ∏ i = 1 N P ( y i ) P ( x i ∣ y i ) (乘法公式) = ∏ i = 1 N ( P ( y i ) ∏ j = 1 n P ( x i ( j ) ∣ y i ) ) (条件独立假设) = ∏ i = 1 N ∏ k = 1 K ( P ( y = c k ) ∏ j = 1 n P ( x i ( j ) ∣ y i = c k ) ) I ( y i = c k ) = ∏ i = 1 N ∏ k = 1 K ( θ k ∏ j = 1 n ∏ l = 1 L j P I ( x i ( j ) = a j l ) ( x ( j ) = a j l ∣ y i = c k ) ) I ( y i = c k ) = ∏ i = 1 N ∏ k = 1 K ( θ k ∏ j = 1 n ∏ l = 1 L j μ j l k I ( x i ( j ) = a j l ) ) I ( y i = c k ) \begin{align} L(\theta,\mu)&=\prod\limits_{i=1}^{N}P(x_i,y_i)\notag\\ &=\prod\limits_{i=1}^{N}P(y_i)P(x_i|y_i)(乘法公式)\notag\\ &=\prod\limits_{i=1}^{N}\Big(P(y_i)\prod\limits_{j=1}^{n}P(x^{(j)}_i|y_i)\Big)(条件独立假设)\notag\\ &=\prod\limits_{i=1}^{N}\prod\limits_{k=1}^{K}\Big(P(y=c_k)\prod\limits_{j=1}^{n}P(x^{(j)}_i|y_i=c_k)\Big)^{I(y_i=c_k)}\notag\\ &=\prod\limits_{i=1}^{N}\prod\limits_{k=1}^{K}\Big(\theta_k\prod\limits_{j=1}^{n}\prod\limits_{l=1}^{L_j}P^{I(x^{(j)}_i=a_{jl})}(x^{(j)}=a_{jl}|y_i=c_k)\Big)^{I(y_i=c_k)}\notag\\ &=\prod\limits_{i=1}^{N}\prod\limits_{k=1}^{K}\Big(\theta_k\prod\limits_{j=1}^{n}\prod\limits_{l=1}^{L_j}\mu_{jlk}^{I(x^{(j)}_i=a_{jl})}\Big)^{I(y_i=c_k)}\notag\\ \end{align} L(θ,μ)=i=1NP(xi,yi)=i=1NP(yi)P(xiyi)(乘法公式)=i=1N(P(yi)j=1nP(xi(j)yi))(条件独立假设)=i=1Nk=1K(P(y=ck)j=1nP(xi(j)yi=ck))I(yi=ck)=i=1Nk=1K(θkj=1nl=1LjPI(xi(j)=ajl)(x(j)=ajlyi=ck))I(yi=ck)=i=1Nk=1K(θkj=1nl=1LjμjlkI(xi(j)=ajl))I(yi=ck)
其中, N N N 为样本数, n n n X X X 的维数, L j L_j Lj X ( j ) X^{(j)} X(j) 可能的取值数量, K K K Y Y Y 可能的取值数量。

取对数:
l ( θ , μ ) = ∑ i = 1 N ∑ k = 1 K I ( y i = c k ) ( log ⁡ θ k + ∑ j = 1 n ∑ l = 1 L j I ( x i ( j ) = a j l ) log ⁡ μ j l k ) \begin{align} l(\theta,\mu)&=\sum\limits_{i=1}^{N}\sum\limits_{k=1}^{K}I(y_i=c_k)\Big(\log\theta_k+\sum\limits_{j=1}^{n}\sum\limits_{l=1}^{L_j}I(x^{(j)}_i=a_{jl})\log\mu_{jlk}\Big)\notag \end{align} l(θ,μ)=i=1Nk=1KI(yi=ck)(logθk+j=1nl=1LjI(xi(j)=ajl)logμjlk)

Step2. 求 θ k \theta_k θk

利用拉格朗日乘数法引入约束条件 ∑ k = 1 K θ k = 1 \sum\limits_{k=1}^{K}\theta_k=1 k=1Kθk=1,得:
F ( θ , μ , λ ) = ∑ i = 1 N ∑ k = 1 K I ( y i = c k ) ( log ⁡ θ k + ∑ j = 1 n ∑ l = 1 L j I ( x i ( j ) = a j l ) log ⁡ μ j l k ) + λ ( ∑ k = 1 K θ k − 1 ) \begin{align} F(\theta,\mu,\lambda)=\sum\limits_{i=1}^{N}\sum\limits_{k=1}^{K}I(y_i=c_k)(\log\theta_k+\sum\limits_{j=1}^{n}\sum\limits_{l=1}^{L_j}I(x^{(j)}_i=a_{jl})\log\mu_{jlk})+\lambda(\sum\limits_{k=1}^{K}\theta_k-1)\notag \end{align} F(θ,μ,λ)=i=1Nk=1KI(yi=ck)(logθk+j=1nl=1LjI(xi(j)=ajl)logμjlk)+λ(k=1Kθk1)

F F F 求偏导并令偏导数为 0 0 0 ,得:
θ k = − ∑ i = 1 N I ( y i = c k ) λ ∑ k = 1 K θ k = − N λ = 1 \begin{align} \theta_k&=-\frac{\sum\limits_{i=1}^{N}I(y_i=c_k)}{\lambda}\notag\\ \sum\limits_{k=1}^{K}\theta_k&=-\frac{N}{\lambda}=1\notag \end{align} θkk=1Kθk=λi=1NI(yi=ck)=λN=1
其中, N k N_k Nk 为样本中 Y = c k Y=c_k Y=ck 的数量。联立上面的两个方程,得:
θ k = ∑ i = 1 N I ( y i = c k ) N \begin{align} \theta_k=\frac{\sum\limits_{i=1}^{N}I(y_i=c_k)}{N}\notag \end{align} θk=Ni=1NI(yi=ck)

Step3. 求 μ l k \mu_{lk} μlk

利用拉格朗日乘数法引入约束条件 ∑ l = 1 L j μ l k = 1 \sum\limits_{l=1}^{L_j}\mu_{lk}=1 l=1Ljμlk=1,得:
F ( θ , μ , λ ) = ∑ i = 1 N ∑ k = 1 K I ( y i = c k ) ( log ⁡ θ k + ∑ j = 1 n ∑ l = 1 L j I ( x i ( j ) = a j l ) log ⁡ μ j l k ) + λ ( ∑ l = 1 L j μ l k − 1 ) \begin{align} F(\theta,\mu,\lambda)=\sum\limits_{i=1}^{N}\sum\limits_{k=1}^{K}I(y_i=c_k)\Big(\log\theta_k+\sum\limits_{j=1}^{n}\sum\limits_{l=1}^{L_j}I(x^{(j)}_i=a_{jl})\log\mu_{jlk})+\lambda(\sum\limits_{l=1}^{L_j}\mu_{lk}-1\Big)\notag \end{align} F(θ,μ,λ)=i=1Nk=1KI(yi=ck)(logθk+j=1nl=1LjI(xi(j)=ajl)logμjlk)+λ(l=1Ljμlk1)

F F F 求偏导并令偏导数为 0 0 0 ,得:
μ j l k = − ∑ i = 1 N I ( y i = c k , x i ( j ) = a j l ) λ ∑ l = 1 L j μ l k = − ∑ i = 1 N I ( y i = c k ) λ = 1 \begin{align} \mu_{jlk}&=-\frac{\sum\limits_{i=1}^{N}I(y_i=c_k,x^{(j)}_i=a_{jl})}{\lambda}\notag\\ \sum\limits_{l=1}^{L_j}\mu_{lk}&=-\frac{\sum\limits_{i=1}^{N}I(y_i=c_k)}{\lambda}=1\notag \end{align} μjlkl=1Ljμlk=λi=1NI(yi=ck,xi(j)=ajl)=λi=1NI(yi=ck)=1
联立上面两个方程,得:
μ j l k = ∑ i = 1 N I ( y i = c k , x i ( j ) = a j l ) ∑ i = 1 N I ( y i = c k ) \begin{align} \mu_{jlk}=\frac{\sum\limits_{i=1}^{N}I(y_i=c_k,x^{(j)}_i=a_{jl})}{\sum\limits_{i=1}^{N}I(y_i=c_k)}\notag \end{align} μjlk=i=1NI(yi=ck)i=1NI(yi=ck,xi(j)=ajl)

猜你喜欢

转载自blog.csdn.net/MaTF_/article/details/131458222