神经网络反向传播BP算法举例说明

0. 前言

上篇博客对BP算法的大致步骤进行了总结,本篇博客将通过一个具体的例子来模拟一下这个算法的实现过程 !

1. BP算法例子说明

1.1 网络结构

在这里插入图片描述

1.2 权重及偏移

w = ( 0.1 , 0.15 , 0.2 , 0.25 , 0.3 , 0.35 , 0.4 , 0.45 , 0.5 , 0.55 , 0.6 , 0.65 ) w=(0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65) w=(0.1,0.15,0.2,0.25,0.3,0.35,0.4,0.45,0.5,0.55,0.6,0.65) 分别对应 w 1 w_1 w1 w 12 w_{12} w12

b = ( 0.35 , 0.65 ) b=(0.35, 0.65) b=(0.35,0.65) 对应 b 1 b_1 b1 b 2 b_2 b2

1.3 激活函数

Sigmoid函数: s i g m o i d ( x ) = sigmoid(x)= sigmoid(x)= 1 1 + e − x {1}\over{1+e^{-x}} 1+ex1

1.4 正向传播

输入层到隐藏层

  • h 1 h_1 h1 的线性计算结果以及经过激活函数计算后的结果
    n e t h 1 = w 1 ⋅ x 1 + w 2 ⋅ x 2 + b 1 ⋅ 1 net_{h_1}=w_1\cdot x_1 + w_2\cdot x_2 +b_1\cdot1 neth1=w1x1+w2x2+b11
                = 0.1 ⋅ 5 + 0.15 ⋅ 10 + 0.35 ⋅ 1 =0.1\cdot5+0.15\cdot10+0.35\cdot1 =0.15+0.1510+0.351
                = 2.35 =2.35 =2.35
    o u t h 1 = 1 1 + e − n e t h 1 = 1 1 + e − 2.35 = 0.9129342275597286 ≈ 0.912934 out_{h_1}=\frac{1}{1+e^{-net_{h_1}}}=\frac{1}{1+e^{-2.35}}=0.9129342275597286\approx0.912934 outh1=1+eneth11=1+e2.351=0.91293422755972860.912934

同上,计算 h 2 h_2 h2 h 3 h_3 h3

  • h 2 h_2 h2 的线性计算结果以及经过激活函数计算后的结果
    n e t h 2 = w 3 ⋅ x 1 + w 4 ⋅ x 2 + b 1 ⋅ 1 net_{h_2}=w_3\cdot x_1 + w_4\cdot x_2 +b_1\cdot1 neth2=w3x1+w4x2+b11
                = 0.2 ⋅ 5 + 0.25 ⋅ 10 + 0.35 ⋅ 1 =0.2\cdot5+0.25\cdot10+0.35\cdot1 =0.25+0.2510+0.351
                = 3.85 =3.85 =3.85
    o u t h 2 = 1 1 + e − n e t h 2 = 1 1 + e − 3.85 = 0.9791636554813196 ≈ 0.979164 out_{h_2}=\frac{1}{1+e^{-net_{h_2}}}=\frac{1}{1+e^{-3.85}}=0.9791636554813196\approx0.979164 outh2=1+eneth21=1+e3.851=0.97916365548131960.979164
  • h 3 h_3 h3 的线性计算结果以及经过激活函数计算后的结果
    n e t h 3 = w 5 ⋅ x 1 + w 6 ⋅ x 2 + b 1 ⋅ 1 net_{h_3}=w_5\cdot x_1 + w_6\cdot x_2 +b_1\cdot1 neth3=w5x1+w6x2+b11
                = 0.3 ⋅ 5 + 0.35 ⋅ 10 + 0.35 ⋅ 1 =0.3\cdot5+0.35\cdot10+0.35\cdot1 =0.35+0.3510+0.351
                = 5.35 =5.35 =5.35
    o u t h 3 = 1 1 + e − n e t h 3 = 1 1 + e − 5.35 = 0.9952742873976046 ≈ 0.995274 out_{h_3}=\frac{1}{1+e^{-net_{h_3}}}=\frac{1}{1+e^{-5.35}}=0.9952742873976046\approx0.995274 outh3=1+eneth31=1+e5.351=0.99527428739760460.995274

隐藏层到输出层

接下来需要计算 o 1 o_1 o1 o 2 o_2 o2 ,对于输出层计算,其输入就是上一层的输出,即 o u t h 1 out_{h_1} outh1 o u t h 2 out_{h_2} outh2 o u t h 3 out_{h_3} outh3,所以有如下计算过程:

  • o 1 o_1 o1 的线性计算结果以及经过激活函数计算后的结果
    n e t o 1 = w 7 ⋅ o u t h 1 + w 9 ⋅ o u t h 2 + w 11 ⋅ o u t h 3 + b 2 ⋅ 1 net_{o_1}=w_7\cdot out_{h_1} + w_9\cdot out_{h_2} +w_{11}\cdot out_{h_3}+b_2\cdot1 neto1=w7outh1+w9outh2+w11outh3+b21
                = 0.4 ⋅ 0.912934 + 0.5 ⋅ 0.979164 + 0.6 ⋅ 0.995274 + 0.65 ⋅ 1 =0.4\cdot0.912934+0.5\cdot0.979164+0.6\cdot0.995274+0.65\cdot1 =0.40.912934+0.50.979164+0.60.995274+0.651
                = 2.10192 =2.10192 =2.10192
    o u t o 1 = 1 1 + e − n e t o 1 = 1 1 + e − 2.10192 = 0.8910896526253574 ≈ 0.891090 out_{o_1}=\frac{1}{1+e^{-net_{o_1}}}=\frac{1}{1+e^{-2.10192}}=0.8910896526253574\approx0.891090 outo1=1+eneto11=1+e2.101921=0.89108965262535740.891090

  • o 2 o_2 o2 的线性计算结果以及经过激活函数计算后的结果
    n e t o 2 = w 8 ⋅ o u t h 1 + w 10 ⋅ o u t h 2 + w 12 ⋅ o u t h 3 + b 2 ⋅ 1 net_{o_2}=w_8\cdot out_{h_1} + w_{10}\cdot out_{h_2} +w_{12}\cdot out_{h_3}+b_2\cdot1 neto2=w8outh1+w10outh2+w12outh3+b21
                = 0.45 ⋅ 0.912934 + 0.55 ⋅ 0.979164 + 0.65 ⋅ 0.995274 + 0.65 ⋅ 1 =0.45\cdot0.912934+0.55\cdot0.979164+0.65\cdot0.995274+0.65\cdot1 =0.450.912934+0.550.979164+0.650.995274+0.651
                = 2.2462886 =2.2462886 =2.2462886
    o u t o 2 = 1 1 + e − n e t o 2 = 1 1 + e − 2.2462886 = 0.9043299162220731 ≈ 0.904330 out_{o_2}=\frac{1}{1+e^{-net_{o_2}}}=\frac{1}{1+e^{-2.2462886}}=0.9043299162220731\approx0.904330 outo2=1+eneto21=1+e2.24628861=0.90432991622207310.904330

误差

E t o t a l = E o 1 + E o 2 = 1 2 ( 0.01 − 0.891090 ) 2 + 1 2 ( 0.99 − 0.904330 ) 2 = 0.39182946850000006 ≈ 0.391829 E_{total}=E_{o_1}+E_{o_2}=\frac{1}{2}(0.01-0.891090)^{2}+\frac{1}{2}(0.99-0.904330)^{2}=0.39182946850000006\approx0.391829 Etotal=Eo1+Eo2=21(0.010.891090)2+21(0.990.904330)2=0.391829468500000060.391829

1.5 反向传播

输入层和隐藏层之间
此处,以更新 w 7 w_7 w7 为例,要对其求偏导:
根据链式求导法则有:

∂ E t o t a l ∂ w 7 = ∂ E t o t a l ∂ o u t o 1 ⋅ ∂ o u t o 1 ∂ n e t o 1 ⋅ ∂ n e t o 1 ∂ w 7 \frac{\partial E_{total}}{\partial w_{7}}=\frac{\partial E_{total}}{\partial out_{o_1}}\cdot\frac{\partial out_{o_1}}{\partial net_{o_1}}\cdot\frac{\partial net_{o_1}}{\partial w_{7}} w7Etotal=outo1Etotalneto1outo1w7neto1

这是因为 w 7 w_7 w7 的变化,会影响 n e t o 1 net_{o_1} neto1 n e t o 1 net_{o_1} neto1 又会影响 o u t o 1 out_{o_1} outo1 o u t o 1 out_{o_1} outo1 又会影响 E o 1 E_{o_1} Eo1,最终影响 E t o t a l E_{total} Etotal
即, w 7 → n e t o 1 → o u t o 1 → E o 1 → E t o t a l w_7\rightarrow net_{o_1}\rightarrow out_{o_1}\rightarrow E_{o_1}\rightarrow E_{total} w7neto1outo1Eo1Etotal
PS:
上面的过程其实就类似于对一个复合函数求偏导,假设对复合函数 f ( g ( h ( x ) ) ) f(g(h(x))) f(g(h(x))) 求偏导,就是: ∂ f ∂ x = ∂ f ∂ g ⋅ ∂ g ∂ h ⋅ ∂ h ∂ x \frac{\partial f}{\partial x}=\frac{\partial f}{\partial g}\cdot\frac{\partial g}{\partial h}\cdot\frac{\partial h}{\partial x} xf=gfhgxh

又,

E o 1 = 1 2 ( r e a l o 1 − o u t o 1 ) 2 E_{o_1}=\frac{1}{2}(real_{o_1}-out_{o_1})^2 Eo1=21(realo1outo1)2

E t o t a l = E o 1 + E o 2 E_{total}=E_{o_1}+E_{o_2} Etotal=Eo1+Eo2

所以,

∂ E t o t a l ∂ o u t o 1 = 2 ⋅ 1 2 ( r e a l o 1 − o u t o 1 ) ⋅ ( − 1 ) + 0 = − ( 0.01 − 0.891090 ) = 0.88109 \frac{\partial E_{total}}{\partial out_{o_1}}=2\cdot\frac{1}{2}(real_{o_1}-out_{o_1})\cdot(-1)+0=-(0.01-0.891090)=0.88109 outo1Etotal=221(realo1outo1)(1)+0=(0.010.891090)=0.88109

又,
o u t o 1 = 1 1 + e − n e t o 1 out_{o_1}=\frac{1}{1+e^{-net_{o_1}}} outo1=1+eneto11
所以,

∂ o u t o 1 ∂ n e t o 1 = o u t o 1 ⋅ ( 1 − o u t o 1 ) = 0.891090 ⋅ ( 1 − 0.891090 ) = 0.09704861189999996 ≈ 0.097049 \frac{\partial out_{o_1}}{\partial net_{o_1}}=out_{o_1}\cdot(1-out_{o_1})=0.891090\cdot(1-0.891090)=0.09704861189999996\approx0.097049 neto1outo1=outo1(1outo1)=0.891090(10.891090)=0.097048611899999960.097049

PS:
这一步的偏导,其实就是对sigmoid函数求导数:
g ( z ) = 1 1 + e − z g(z)=\frac{1}{1+e^{-z}} g(z)=1+ez1
g ′ ( z ) = e − z ( 1 + e − z ) 2 = 1 + e − z − 1 ( 1 + e − z ) 2 = 1 ( 1 + e − z ) − 1 ( 1 + e − z ) 2 = g ( z ) ⋅ ( 1 − g ( z ) ) g'(z)=\frac{e^{-z}}{(1+e^{-z})^2}=\frac{1+e^{-z}-1}{(1+e^{-z})^2}=\frac{1}{(1+e^{-z})}-\frac{1}{(1+e^{-z})^2}=g(z)\cdot(1-g(z)) g(z)=(1+ez)2ez=(1+ez)21+ez1=(1+ez)1(1+ez)21=g(z)(1g(z))

又,
n e t o 1 = w 7 ⋅ o u t h 1 + w 9 ⋅ o u t h 2 + w 11 ⋅ o u t h 3 + b 2 ⋅ 1 net_{o_1}=w_7\cdot out_{h_1} + w_9\cdot out_{h_2} +w_{11}\cdot out_{h_3}+b_2\cdot1 neto1=w7outh1+w9outh2+w11outh3+b21
所以,

∂ n e t o 1 ∂ w 7 = o u t h 1 + 0 + 0 + 0 = 0.912934 \frac{\partial net_{o_1}}{\partial w_{7}}=out_{h_1}+0+0+0=0.912934 w7neto1=outh1+0+0+0=0.912934

故,

∂ E t o t a l ∂ w 7 = ∂ E t o t a l ∂ o u t o 1 ⋅ ∂ o u t o 1 ∂ n e t o 1 ⋅ ∂ n e t o 1 ∂ w 7 = 0.88109 ⋅ 0.097049 ⋅ 0.912934 ≈ 0.078064 \frac{\partial E_{total}}{\partial w_{7}}=\frac{\partial E_{total}}{\partial out_{o_1}}\cdot\frac{\partial out_{o_1}}{\partial net_{o_1}}\cdot\frac{\partial net_{o_1}}{\partial w_{7}}=0.88109\cdot0.097049\cdot0.912934\approx0.078064 w7Etotal=outo1Etotalneto1outo1w7neto1=0.881090.0970490.9129340.078064

再根据梯度下降的原理,对 w 7 w_7 w7 进行更新:

w 7 ′ = w 7 − α ∂ E t o t a l ∂ w 7 = 0.4 − 0.5 ⋅ 0.078064 = 0.360968 w_7'=w_7-\alpha \frac{\partial E_{total}}{\partial w_{7}}=0.4-0.5\cdot0.078064=0.360968 w7=w7αw7Etotal=0.40.50.078064=0.360968

上面的 α \alpha α 是学习率,是可以人为设定的,具体的梯度下降的原理介绍可以参见这篇博客

上面整个反向传播是以调整 w 7 w_7 w7 的值为例,对于其他的 w w w 的更新也是与上述过程类似的 !

所以,同理得到下面的更新:

w 8 ′ = 0.453383 w'_8=0.453383 w8=0.453383 w 9 ′ = 0.458137 w'_9=0.458137 w9=0.458137 w 10 ′ = 0.553629 w'_{10}=0.553629 w10=0.553629 w 11 ′ = 0.557448 w'_{11}=0.557448 w11=0.557448 w 12 ′ = 0.653688 w'_{12}=0.653688 w12=0.653688

观察这一层 w w w 的变化,可以看到 w 7 , 9 , 11 w_{7,9,11} w7,9,11 在原来的基础上都减小了;而 w 8 , 10 , 12 w_{8,10,12} w8,10,12 在原来的基础上都增大了,这个变化也正好符合真实值的结果。

隐藏层和输入层之间

此处,以更新 w 1 w_1 w1 为例:

w 1 → n e t h 1 → o u t h 1 → n e t o 1 → o u t o 1 → E o 1 → E t o t a l w_1\rightarrow net_{h_1}\rightarrow out_{h_1}\rightarrow net_{o_1}\rightarrow out_{o_1}\rightarrow E_{o_1}\rightarrow E_{total} w1neth1outh1neto1outo1Eo1Etotal
w 1 → n e t h 1 → o u t h 1 → n e t o 2 → o u t o 2 → E o 2 → E t o t a l w_1\rightarrow net_{h_1}\rightarrow out_{h_1}\rightarrow net_{o_2}\rightarrow out_{o_2}\rightarrow E_{o_2}\rightarrow E_{total} w1neth1outh1neto2outo2Eo2Etotal

此处对其求偏导稍微复杂一点:

∂ E t o t a l ∂ w 1 = ∂ E t o t a l ∂ o u t h 1 ⋅ ∂ o u t h 1 ∂ n e t h 1 ⋅ ∂ n e t h 1 ∂ w 1 = ( ∂ E o 1 ∂ o u t h 1 + ∂ E o 2 ∂ o u t h 1 ) ⋅ ∂ o u t h 1 ∂ n e t h 1 ⋅ ∂ n e t h 1 ∂ w 1 \frac{\partial E_{total}}{\partial w_{1}}=\frac{\partial E_{total}}{\partial out_{h_1}}\cdot\frac{\partial out_{h_1}}{\partial net_{h_1}}\cdot\frac{\partial net_{h_1}}{\partial w_{1}}=(\frac{\partial E_{o_1}}{\partial out_{h_1}}+\frac{\partial E_{o_2}}{\partial out_{h_1}})\cdot\frac{\partial out_{h_1}}{\partial net_{h_1}}\cdot\frac{\partial net_{h_1}}{\partial w_{1}} w1Etotal=outh1Etotalneth1outh1w1neth1=(outh1Eo1+outh1Eo2)neth1outh1w1neth1

∂ E o 1 ∂ o u t h 1 = ∂ E o 1 ∂ o u t o 1 ⋅ ∂ o u t o 1 ∂ n e t o 1 ⋅ ∂ n e t o 1 ∂ o u t h 1 \frac{\partial E_{o_1}}{\partial out_{h_1}}=\frac{\partial E_{o_1}}{\partial out_{o_1}}\cdot\frac{\partial {out_{o_1}}}{\partial net_{o_1}}\cdot\frac{\partial {net_{o_1}}}{\partial out_{h_1}} outh1Eo1=outo1Eo1neto1outo1outh1neto1

后面的计算与更新过程与上边类似 !

上面的整个更新就算是一次的更新,随着迭代次数的增加,会越来越接近真实的结果。

以上就是整个FP及BP过程的举例说明,下面将用代码来模拟一下这个过程 !

猜你喜欢

转载自blog.csdn.net/weixin_41857483/article/details/110247231