统计机器学习-贝叶斯公式

概率论的两大基本规则:
加法规则: P ( X ) = Y P ( X , Y ) P(X)=\sum_Y P(X,Y)
乘法规则: P ( X , Y ) = P ( Y ) P ( X Y ) = P ( X ) P ( Y X ) P(X,Y)=P(Y)P(X|Y)=P(X)P(Y|X)

演化得:
贝叶斯公式:
P ( Y X ) = P ( X Y ) P ( Y ) P ( X ) P(Y|X)=\frac {P(X|Y)P(Y)}{P(X)}
P ( X Y ) 似然函数:P(X|Y)
P ( Y ) 先验分布:P(Y)
配分函数:( P(X)对所有的Y展开 ) P ( X ) = Y P ( X Y ) P ( Y ) P(X)=\sum_{Y} P(X|Y)P(Y)
所以贝叶斯公式也可以写为:
P ( Y X ) = P ( X Y ) P ( Y ) Y P ( X Y ) P ( Y ) P(Y|X)=\frac{P(X|Y)P(Y)}{\sum_{Y} P(X|Y)P(Y)}
还可以写为:
P ( Y i X ) = P ( X Y i ) P ( Y i ) j P ( X Y j ) P ( Y j ) P(Y_i|X)=\frac{P(X|Y_i)P(Y_i)}{\sum_{j} P(X|Y_j)P(Y_j)}

贝叶斯公式的应用:

血友病是X隐性遗传病
 
一个正常妇女,哥哥患血友病
 
假设 θ = 1 \theta=1 为携带致病基因, θ = 0 \theta=0 为不携带致病基因
 
那么对这个妇女来讲,
p ( θ = 1 ) = P ( θ = 0 ) = 1 2 p(\theta=1)=P(\theta=0)=\frac{1}{2}
这个妇女生了两个儿子,如果这两个儿子均正常:
P ( y 1 = 0 , y 2 = 0 θ = 0 ) = 1 × 1 P(y_1=0,y_2=0|\theta=0)=1×1
P ( y 1 = 0 , y 2 = 0 θ = 1 ) = 1 2 × 1 2 P(y_1=0,y_2=0|\theta=1)=\frac{1}{2}×\frac{1}{2}
我们反推这个妇女的患病概率:
P ( θ = 1 y 1 = 0 , y 2 = 0 ) = P ( y 1 = 0 , y 2 = 0 θ = 1 ) P ( θ = 1 ) P ( y 1 = 0 , y 2 = 0 ) P(\theta=1|y_1=0,y_2=0)=\frac{P(y_1=0,y_2=0|\theta=1)P(\theta=1)}{P(y_1=0,y_2=0)}
由配分函数
P ( X ) = Y P ( X Y ) P ( Y ) P(X)=\sum_{Y} P(X|Y)P(Y)
可知:
P ( y ) = θ P ( y θ ) P ( θ ) P(y)=\sum_{\theta}P(y|\theta)P(\theta)

P ( y 1 = 0 , y 2 = 0 ) = P ( y 1 = 0 , y 2 = 0 ) P ( θ = 1 ) + P ( y 1 = 0 , y 2 = 0 ) P ( θ = 0 ) P(y_1=0,y_2=0)=P(y_1=0,y_2=0)P(\theta=1)+P(y_1=0,y_2=0)P(\theta=0)
所以这个妇女的患病概率公式就变成了:
P ( θ = 1 y 1 = 0 , y 2 = 0 ) = P ( y 1 = 0 , y 2 = 0 θ = 1 ) P ( θ = 1 ) P ( y 1 = 0 , y 2 = 0 θ = 1 ) P ( θ = 1 ) + P ( y 1 = 0 , y 2 = 0 θ = 0 ) P ( θ = 0 ) P(\theta=1|y_1=0,y_2=0)=\frac{P(y_1=0,y_2=0|\theta=1)P(\theta=1)}{P(y_1=0,y_2=0|\theta=1)P(\theta=1)+P(y_1=0,y_2=0|\theta=0)P(\theta=0)}
算出来该妇女在生了两个健康的孩子的条件下的患病概率:
P ( θ = 1 y 1 = 0 , y 2 = 0 ) = 1 2 × 1 2 × 1 2 1 2 × 1 2 × 1 2 1 × 1 × 1 2 = 1 8 5 8 = 1 5 P(\theta=1|y_1=0,y_2=0)=\frac{\frac{1}{2}×\frac{1}{2}×\frac{1}{2}}{\frac{1}{2}×\frac{1}{2}×\frac{1}{2}+1×1×\frac{1}{2}}=\frac{\frac{1}{8}}{\frac{5}{8}}=\frac{1}{5}


如果该妇女生了3个健康的儿子
由于这里的孩子都是没病的,我们简化书写:
P ( y 1 = 0 , y 2 = 0 , y 3 = 0 ) = P ( y ) = P ( y 1 = 0 , y 2 = 0 , . . . , y n = 0 ) P(y_1=0,y_2=0,y_3=0)=P(y)=P(y_1=0,y_2=0,...,y_n=0)
则:
P ( y 1 = 0 , y 2 = 0 y 3 = 0 θ = 0 ) = P ( y θ = 0 ) = 1 × 1 × 1 P(y_1=0,y_2=0,y_3=0|\theta=0)=P(y|\theta=0)=1×1×1
P ( y 1 = 0 , y 2 = 0 y 3 = 0 θ = 1 ) = P ( y θ = 1 ) = 1 2 × 1 2 × 1 2 P(y_1=0,y_2=0,y_3=0|\theta=1)=P(y|\theta=1)=\frac{1}{2}×\frac{1}{2}×\frac{1}{2}
反推这个妇女的患病概率:

P ( θ = 1 y ) = P ( y θ = 1 ) P ( θ = 1 ) p ( y ) = P ( y θ = 1 ) P ( θ = 1 ) P ( y θ = 0 ) P ( θ = 0 ) + P ( y θ = 1 ) P ( θ = 1 ) P(\theta=1|y)=\frac{P(y|\theta=1)P(\theta=1)}{p(y)}=\frac{P(y|\theta=1)P(\theta=1)}{P(y|\theta=0)P(\theta=0)+P(y|\theta=1)P(\theta=1)}
= ( 1 2 × 1 2 × 1 2 ) × 1 2 1 × 1 × 1 × 1 2 + ( 1 2 × 1 2 × 1 2 ) × 1 2 = 1 16 9 16 = 1 9 0.111111 =\frac{(\frac{1}{2}×\frac{1}{2}×\frac{1}{2})×\frac{1}{2}}{1×1×1×\frac{1}{2}+(\frac{1}{2}×\frac{1}{2}×\frac{1}{2})×\frac{1}{2}}=\frac{\frac{1}{16}}{\frac{9}{16}}=\frac{1}{9}≈0.111111


继续推广,假设这个妇女生了n个健康的儿子:
P ( y θ = 0 ) = 1 n P(y|\theta=0)=1^n
P ( y θ = 1 ) = ( 1 2 ) n P(y|\theta=1)=(\frac{1}{2})^n
那么这个妇女的患病概率为:
P ( θ = 1 y ) = P ( y θ = 1 ) P ( θ = 1 ) P ( y ) = P ( y θ = 1 ) P ( θ = 1 ) P ( y θ = 0 ) P ( θ = 0 ) + P ( y θ = 1 ) P ( θ = 1 ) = ( 1 2 ) n + 1 1 n × 1 2 + ( 1 2 ) n + 1 P(\theta=1|y)=\frac{P(y|\theta=1)P(\theta=1)}{P(y)}=\frac{P(y|\theta=1)P(\theta=1)}{P(y|\theta=0)P(\theta=0)+P(y|\theta=1)P(\theta=1)}=\frac{(\frac{1}{2})^{n+1}}{1^n×\frac{1}{2}+(\frac{1}{2})^{n+1}}
由此可以看出,当 n n\to∞ ,这名妇女患病的概率就成了 0

事实上,当n为10时,这名妇女的患病概率就已经非常小了(0.001949317738791423,几近于0),不信我们用matplotlib模拟一下看看:

import numpy as np
from matplotlib import pyplot as plt

plt.xlim((0,10))
plt.ylim((0,0.5))
x = np.arange(0, 11)
y = (0.5**(x+1))/((0.5**(x+1))+0.5)
plt.title("Bayes")
plt.xlabel("该妇女所生孩子个数")
plt.ylabel("该妇女携带致病基因概率")
plt.plot(x, y,color='red')
plt.show()

在这里插入图片描述
如果我把x区间修改为20:
在这里插入图片描述
所以可以验证上面:n为10的时候就已经可以认为这名妇女不携带致病基因了


类似的图还可以用pyecharts画出来:

from pyecharts.charts import *
from pyecharts import options as opts
from pyecharts.render import make_snapshot
from snapshot_selenium import snapshot
from pyecharts.globals import ThemeType

list_x = [x for x in range(0, 11)]
list_y = []
for x in range(0, 11):
    list_y.append((0.5 ** (x + 1)) / ((0.5 ** (x + 1)) + 0.5))

line = (
    Line(init_opts=opts.InitOpts(theme=ThemeType.WALDEN))
        .add_xaxis(list_x)
        .add_yaxis("", list_y, is_smooth=True)
        .set_global_opts(title_opts=opts.TitleOpts(title="Bayes", pos_left='center',),
                         yaxis_opts=opts.AxisOpts(name="该妇女携带致病基因概率"),
                         xaxis_opts=opts.AxisOpts(name="该妇女所生孩子个数"))
        .set_series_opts(label_opts=opts.LabelOpts(is_show=False))
)

make_snapshot(snapshot, line.render(), "Bayes.png")

在这里插入图片描述

发布了30 篇原创文章 · 获赞 23 · 访问量 1万+

猜你喜欢

转载自blog.csdn.net/qq_43613793/article/details/104735147