Reference article
Neural Networks and Deep Learning. Michael A. Nielsen
An article to understand the backpropagation method in neural networks : it is very detailed, and the update process of the weights in the backpropagation method is demonstrated with an example, but the update of the bias is not involved
Suppose a three-layer neural network structure diagram is as follows:
For a single training sample x its quadratic cost function can be written as:
C = 1/2|| y - a L || 2 = 1/2∑ j (y j - a j L ) 2
a j L = σ (z j L )
z j l = ∑ k ω jk l a k l-1 + b j l
The cost function C is a function of a j L , a j L is a function of z j L , z j L is a function of ω jk L , and a function of a k L-1 at the same time ...
Prove four fundamental equations (BP1-BP4), all of which are corollaries of the chain rule of multivariate calculus
δ j L = (∂C / ∂a j L ) σ '(z j L ) (BP1)
δjl = ∑k ωkjl+1δkl+1σ'(zjl) (BP2)
∂C/∂ω jk l = δ j l a k l-1 (BP3)
∂C/∂bjl = δjl (BP4)
1. Let us start with equation (BP1), which gives the expression for the output error δL .
δjL = ∂C/∂zjL
Applying the chain rule, we can reformulate the partial derivatives above in terms of the partial derivatives of the output activations:
δjL = ∑k (∂C/∂akL)(∂akL/∂zjL)
Here the summation is run on all neurons k in the output layer, of course, the output activation value a k L of the kth neuron depends only on the weighted input z of the jth neuron when k=j jL . _ So when k≠j
, ∂a k L / ∂z j L =0. The result simplifies to:
δjL = (∂C/∂ajL)(∂ajL/∂zjL)
Since a j L =σ(z j L ), the second term on the right can be written as σ'(z j L ), and the equation becomes
δ j L = (∂C / ∂a j L ) σ '(z j L )
2.证明BP2,它给出了下一层误差δl+1的形式表示误差δl。为此我们要以δkl+1=∂C/∂zkl+1的形式重写 δjl = ∂C/∂zjl
δjl = ∂C/∂zjl
=∑k (∂C/∂zkl+1)(∂zkl+1/∂zjl)
=∑k (∂zkl+1/∂zjl)δkl+1
这里最后一行我们交换了右边的两项,并用δkl+1的定义带入。为此我们对最后一行的第一项求值,
注意:
zkl+1 = ∑jωkjl+1ajl + bkl+1 = ∑jωkjl+1σ(zjl) + bkl+1
做微分得到
∂zkl+1 /∂zjl = ωkjl+1σ'(zjl)
带入上式:
δjl = ∑k ωkjl+1δkl+1σ'(zjl)
3.证明BP3。计算输出层∂C/∂ωjkL:
∂C/∂ωjkL = ∑m (∂C/∂amL)(∂amL/∂ωjkL )
这里求和是在输出层的所有神经元k上运行的,当然,第kth个神经元的输出激活值amL只依赖于当m=j时第jth个神经元的输入权重ωjkL。所以当k≠j
时,∂amL/∂ωjkL=0。结果简化为:
∂C/∂ωjkL = (∂C/∂ajL)(∂ajL/∂zjL)*(∂zjL/∂ωjkL)
= δjLakL-1
计算输入层上一层(L-1):
∂C/∂ωjkL-1= (∑m(∂C/∂amL)(∂amL/∂zmL)(∂zmL/∂ajL-1))(/∂ajL-1/∂zjL-1)(∂zjL-1/∂ωjkL-1)
= (∑mδmLωmjL)σ'(zjL-1)akL-2
= δjL-1akL-2
对于处输入层的任何一层(l):
∂C/∂ωjkl = (∂C/∂zjl )(∂zjl/∂ωjkl ) = δjlakl-1
4.证明BP4。计算输出层∂C/∂bjL:
∂C/∂bjL = ∑m (∂C/∂amL)(∂amL/∂bjL )
这里求和是在输出层的所有神经元k上运行的,当然,第kth个神经元的输出激活值amL只依赖于当m=j时第jth个神经元的输入权重bjL。所以当k≠j
时,∂amL/∂bjL=0。结果简化为:
∂C/∂bjL = (∂C/∂ajL)(∂ajL/∂zjL)*(∂zjL/∂bjL)
= δjL
计算输入层上一层(L-1):
∂C/∂bjL-1= (∑m(∂C/∂amL)(∂amL/∂zmL)(∂zmL/∂ajL-1))(/∂ajL-1/∂zjL-1)(∂zjL-1/∂bjL-1)
= (∑mδmLωmjL)σ'(zjL-1)
= δjL-1
对于处输入层的任何一层(l):
∂C/∂bjl = (∂C/∂zjl )(∂zjl/∂bjl) = δjl