概率论的两大基本规则: 加法规则:
P
(
X
)
=
∑
Y
P
(
X
,
Y
)
P(X)=\sum_Y P(X,Y)
P ( X ) = Y ∑ P ( X , Y ) 乘法规则:
P
(
X
,
Y
)
=
P
(
Y
)
P
(
X
∣
Y
)
=
P
(
X
)
P
(
Y
∣
X
)
P(X,Y)=P(Y)P(X|Y)=P(X)P(Y|X)
P ( X , Y ) = P ( Y ) P ( X ∣ Y ) = P ( X ) P ( Y ∣ X )
演化得: 贝叶斯公式:
P
(
Y
∣
X
)
=
P
(
X
∣
Y
)
P
(
Y
)
P
(
X
)
P(Y|X)=\frac {P(X|Y)P(Y)}{P(X)}
P ( Y ∣ X ) = P ( X ) P ( X ∣ Y ) P ( Y )
似
然
函
数
:
P
(
X
∣
Y
)
似然函数:P(X|Y)
似 然 函 数 : P ( X ∣ Y )
先
验
分
布
:
P
(
Y
)
先验分布:P(Y)
先 验 分 布 : P ( Y ) 配分函数:( P(X)对所有的Y展开 )
P
(
X
)
=
∑
Y
P
(
X
∣
Y
)
P
(
Y
)
P(X)=\sum_{Y} P(X|Y)P(Y)
P ( X ) = Y ∑ P ( X ∣ Y ) P ( Y ) 所以贝叶斯公式也可以写为:
P
(
Y
∣
X
)
=
P
(
X
∣
Y
)
P
(
Y
)
∑
Y
P
(
X
∣
Y
)
P
(
Y
)
P(Y|X)=\frac{P(X|Y)P(Y)}{\sum_{Y} P(X|Y)P(Y)}
P ( Y ∣ X ) = ∑ Y P ( X ∣ Y ) P ( Y ) P ( X ∣ Y ) P ( Y ) 还可以写为:
P
(
Y
i
∣
X
)
=
P
(
X
∣
Y
i
)
P
(
Y
i
)
∑
j
P
(
X
∣
Y
j
)
P
(
Y
j
)
P(Y_i|X)=\frac{P(X|Y_i)P(Y_i)}{\sum_{j} P(X|Y_j)P(Y_j)}
P ( Y i ∣ X ) = ∑ j P ( X ∣ Y j ) P ( Y j ) P ( X ∣ Y i ) P ( Y i )
贝叶斯公式的应用:
血友病是X隐性遗传病 一个正常妇女,哥哥患血友病 假设
θ
=
1
\theta=1
θ = 1 为携带致病基因,
θ
=
0
\theta=0
θ = 0 为不携带致病基因 那么对这个妇女来讲,
p
(
θ
=
1
)
=
P
(
θ
=
0
)
=
1
2
p(\theta=1)=P(\theta=0)=\frac{1}{2}
p ( θ = 1 ) = P ( θ = 0 ) = 2 1 这个妇女生了两个儿子,如果这两个儿子均正常:
P
(
y
1
=
0
,
y
2
=
0
∣
θ
=
0
)
=
1
×
1
P(y_1=0,y_2=0|\theta=0)=1×1
P ( y 1 = 0 , y 2 = 0 ∣ θ = 0 ) = 1 × 1
P
(
y
1
=
0
,
y
2
=
0
∣
θ
=
1
)
=
1
2
×
1
2
P(y_1=0,y_2=0|\theta=1)=\frac{1}{2}×\frac{1}{2}
P ( y 1 = 0 , y 2 = 0 ∣ θ = 1 ) = 2 1 × 2 1 我们反推这个妇女的患病概率:
P
(
θ
=
1
∣
y
1
=
0
,
y
2
=
0
)
=
P
(
y
1
=
0
,
y
2
=
0
∣
θ
=
1
)
P
(
θ
=
1
)
P
(
y
1
=
0
,
y
2
=
0
)
P(\theta=1|y_1=0,y_2=0)=\frac{P(y_1=0,y_2=0|\theta=1)P(\theta=1)}{P(y_1=0,y_2=0)}
P ( θ = 1 ∣ y 1 = 0 , y 2 = 0 ) = P ( y 1 = 0 , y 2 = 0 ) P ( y 1 = 0 , y 2 = 0 ∣ θ = 1 ) P ( θ = 1 ) 由配分函数
P
(
X
)
=
∑
Y
P
(
X
∣
Y
)
P
(
Y
)
P(X)=\sum_{Y} P(X|Y)P(Y)
P ( X ) = Y ∑ P ( X ∣ Y ) P ( Y ) 可知:
P
(
y
)
=
∑
θ
P
(
y
∣
θ
)
P
(
θ
)
P(y)=\sum_{\theta}P(y|\theta)P(\theta)
P ( y ) = θ ∑ P ( y ∣ θ ) P ( θ ) 即
P
(
y
1
=
0
,
y
2
=
0
)
=
P
(
y
1
=
0
,
y
2
=
0
)
P
(
θ
=
1
)
+
P
(
y
1
=
0
,
y
2
=
0
)
P
(
θ
=
0
)
P(y_1=0,y_2=0)=P(y_1=0,y_2=0)P(\theta=1)+P(y_1=0,y_2=0)P(\theta=0)
P ( y 1 = 0 , y 2 = 0 ) = P ( y 1 = 0 , y 2 = 0 ) P ( θ = 1 ) + P ( y 1 = 0 , y 2 = 0 ) P ( θ = 0 ) 所以这个妇女的患病概率公式就变成了:
P
(
θ
=
1
∣
y
1
=
0
,
y
2
=
0
)
=
P
(
y
1
=
0
,
y
2
=
0
∣
θ
=
1
)
P
(
θ
=
1
)
P
(
y
1
=
0
,
y
2
=
0
∣
θ
=
1
)
P
(
θ
=
1
)
+
P
(
y
1
=
0
,
y
2
=
0
∣
θ
=
0
)
P
(
θ
=
0
)
P(\theta=1|y_1=0,y_2=0)=\frac{P(y_1=0,y_2=0|\theta=1)P(\theta=1)}{P(y_1=0,y_2=0|\theta=1)P(\theta=1)+P(y_1=0,y_2=0|\theta=0)P(\theta=0)}
P ( θ = 1 ∣ y 1 = 0 , y 2 = 0 ) = P ( y 1 = 0 , y 2 = 0 ∣ θ = 1 ) P ( θ = 1 ) + P ( y 1 = 0 , y 2 = 0 ∣ θ = 0 ) P ( θ = 0 ) P ( y 1 = 0 , y 2 = 0 ∣ θ = 1 ) P ( θ = 1 ) 算出来该妇女在生了两个健康的孩子的条件下的患病概率:
P
(
θ
=
1
∣
y
1
=
0
,
y
2
=
0
)
=
1
2
×
1
2
×
1
2
1
2
×
1
2
×
1
2
+
1
×
1
×
1
2
=
1
8
5
8
=
1
5
P(\theta=1|y_1=0,y_2=0)=\frac{\frac{1}{2}×\frac{1}{2}×\frac{1}{2}}{\frac{1}{2}×\frac{1}{2}×\frac{1}{2}+1×1×\frac{1}{2}}=\frac{\frac{1}{8}}{\frac{5}{8}}=\frac{1}{5}
P ( θ = 1 ∣ y 1 = 0 , y 2 = 0 ) = 2 1 × 2 1 × 2 1 + 1 × 1 × 2 1 2 1 × 2 1 × 2 1 = 8 5 8 1 = 5 1
如果该妇女生了3个健康的儿子 由于这里的孩子都是没病的,我们简化书写:
P
(
y
1
=
0
,
y
2
=
0
,
y
3
=
0
)
=
P
(
y
)
=
P
(
y
1
=
0
,
y
2
=
0
,
.
.
.
,
y
n
=
0
)
P(y_1=0,y_2=0,y_3=0)=P(y)=P(y_1=0,y_2=0,...,y_n=0)
P ( y 1 = 0 , y 2 = 0 , y 3 = 0 ) = P ( y ) = P ( y 1 = 0 , y 2 = 0 , . . . , y n = 0 ) 则:
P
(
y
1
=
0
,
y
2
=
0
,
y
3
=
0
∣
θ
=
0
)
=
P
(
y
∣
θ
=
0
)
=
1
×
1
×
1
P(y_1=0,y_2=0,y_3=0|\theta=0)=P(y|\theta=0)=1×1×1
P ( y 1 = 0 , y 2 = 0 , y 3 = 0 ∣ θ = 0 ) = P ( y ∣ θ = 0 ) = 1 × 1 × 1
P
(
y
1
=
0
,
y
2
=
0
,
y
3
=
0
∣
θ
=
1
)
=
P
(
y
∣
θ
=
1
)
=
1
2
×
1
2
×
1
2
P(y_1=0,y_2=0,y_3=0|\theta=1)=P(y|\theta=1)=\frac{1}{2}×\frac{1}{2}×\frac{1}{2}
P ( y 1 = 0 , y 2 = 0 , y 3 = 0 ∣ θ = 1 ) = P ( y ∣ θ = 1 ) = 2 1 × 2 1 × 2 1 反推这个妇女的患病概率:
P
(
θ
=
1
∣
y
)
=
P
(
y
∣
θ
=
1
)
P
(
θ
=
1
)
p
(
y
)
=
P
(
y
∣
θ
=
1
)
P
(
θ
=
1
)
P
(
y
∣
θ
=
0
)
P
(
θ
=
0
)
+
P
(
y
∣
θ
=
1
)
P
(
θ
=
1
)
P(\theta=1|y)=\frac{P(y|\theta=1)P(\theta=1)}{p(y)}=\frac{P(y|\theta=1)P(\theta=1)}{P(y|\theta=0)P(\theta=0)+P(y|\theta=1)P(\theta=1)}
P ( θ = 1 ∣ y ) = p ( y ) P ( y ∣ θ = 1 ) P ( θ = 1 ) = P ( y ∣ θ = 0 ) P ( θ = 0 ) + P ( y ∣ θ = 1 ) P ( θ = 1 ) P ( y ∣ θ = 1 ) P ( θ = 1 )
=
(
1
2
×
1
2
×
1
2
)
×
1
2
1
×
1
×
1
×
1
2
+
(
1
2
×
1
2
×
1
2
)
×
1
2
=
1
16
9
16
=
1
9
≈
0.111111
=\frac{(\frac{1}{2}×\frac{1}{2}×\frac{1}{2})×\frac{1}{2}}{1×1×1×\frac{1}{2}+(\frac{1}{2}×\frac{1}{2}×\frac{1}{2})×\frac{1}{2}}=\frac{\frac{1}{16}}{\frac{9}{16}}=\frac{1}{9}≈0.111111
= 1 × 1 × 1 × 2 1 + ( 2 1 × 2 1 × 2 1 ) × 2 1 ( 2 1 × 2 1 × 2 1 ) × 2 1 = 1 6 9 1 6 1 = 9 1 ≈ 0 . 1 1 1 1 1 1
继续推广,假设这个妇女生了n个健康的儿子:
P
(
y
∣
θ
=
0
)
=
1
n
P(y|\theta=0)=1^n
P ( y ∣ θ = 0 ) = 1 n
P
(
y
∣
θ
=
1
)
=
(
1
2
)
n
P(y|\theta=1)=(\frac{1}{2})^n
P ( y ∣ θ = 1 ) = ( 2 1 ) n 那么这个妇女的患病概率为:
P
(
θ
=
1
∣
y
)
=
P
(
y
∣
θ
=
1
)
P
(
θ
=
1
)
P
(
y
)
=
P
(
y
∣
θ
=
1
)
P
(
θ
=
1
)
P
(
y
∣
θ
=
0
)
P
(
θ
=
0
)
+
P
(
y
∣
θ
=
1
)
P
(
θ
=
1
)
=
(
1
2
)
n
+
1
1
n
×
1
2
+
(
1
2
)
n
+
1
P(\theta=1|y)=\frac{P(y|\theta=1)P(\theta=1)}{P(y)}=\frac{P(y|\theta=1)P(\theta=1)}{P(y|\theta=0)P(\theta=0)+P(y|\theta=1)P(\theta=1)}=\frac{(\frac{1}{2})^{n+1}}{1^n×\frac{1}{2}+(\frac{1}{2})^{n+1}}
P ( θ = 1 ∣ y ) = P ( y ) P ( y ∣ θ = 1 ) P ( θ = 1 ) = P ( y ∣ θ = 0 ) P ( θ = 0 ) + P ( y ∣ θ = 1 ) P ( θ = 1 ) P ( y ∣ θ = 1 ) P ( θ = 1 ) = 1 n × 2 1 + ( 2 1 ) n + 1 ( 2 1 ) n + 1 由此可以看出,当
n
→
∞
n\to∞
n → ∞ ,这名妇女患病的概率就成了 0
事实上,当n为10时,这名妇女的患病概率就已经非常小了(0.001949317738791423,几近于0),不信我们用matplotlib模拟一下看看:
import numpy as np
from matplotlib import pyplot as plt
plt. xlim( ( 0 , 10 ) )
plt. ylim( ( 0 , 0.5 ) )
x = np. arange( 0 , 11 )
y = ( 0.5 ** ( x+ 1 ) ) / ( ( 0.5 ** ( x+ 1 ) ) + 0.5 )
plt. title( "Bayes" )
plt. xlabel( "该妇女所生孩子个数" )
plt. ylabel( "该妇女携带致病基因概率" )
plt. plot( x, y, color= 'red' )
plt. show( )
如果我把x区间修改为20: 所以可以验证上面:n为10的时候就已经可以认为这名妇女不携带致病基因了
类似的图还可以用pyecharts画出来:
from pyecharts. charts import *
from pyecharts import options as opts
from pyecharts. render import make_snapshot
from snapshot_selenium import snapshot
from pyecharts. globals import ThemeType
list_x = [ x for x in range ( 0 , 11 ) ]
list_y = [ ]
for x in range ( 0 , 11 ) :
list_y. append( ( 0.5 ** ( x + 1 ) ) / ( ( 0.5 ** ( x + 1 ) ) + 0.5 ) )
line = (
Line( init_opts= opts. InitOpts( theme= ThemeType. WALDEN) )
. add_xaxis( list_x)
. add_yaxis( "" , list_y, is_smooth= True )
. set_global_opts( title_opts= opts. TitleOpts( title= "Bayes" , pos_left= 'center' , ) ,
yaxis_opts= opts. AxisOpts( name= "该妇女携带致病基因概率" ) ,
xaxis_opts= opts. AxisOpts( name= "该妇女所生孩子个数" ) )
. set_series_opts( label_opts= opts. LabelOpts( is_show= False ) )
)
make_snapshot( snapshot, line. render( ) , "Bayes.png" )