反向传播
使用反向传播是为了防止路径的重复计算。
为了方便,我们将之前的一个前向传播的过程复制过来:
Z1=W1X+b1
H1=RELU(Z1)
Z2=W2H1+b2
H2=RELU(Z2)
Z3=W3H2+b3
y^=sigmoid(Z3)
同时,将损失函数也复制过来
J(w,b)=m1i=1∑mL(y^(i),y(i))=−m1i=1∑m[y(i)log(y^(i))+(1−y(i))log(1−y^(i))]+2mλ∥w∥F2
注意 为了直观,没有写出来对矩阵求导的转置。
首先第一件事是对
z3进行求导
∂z3∂J=∂y^∂J∂z3∂y^=y^−y=δ3
然后我们开始对参数w和b进行求导
∂w3∂J=∂z3∂J∂w3∂z3∂b3∂J=δ3H2+m1λw3=∂z3∂J∂b3∂z3=δ3
我们完成了对w3和b3这两个参数进行求导,后面基本类似,就是运用链式求导的法则一层层的往前求
∂z2∂J=∂z3∂J∂H2∂z3∂z2∂H2∂w2∂J=∂z2∂J∂b2∂J=δ3w3relu′(z2)=δ2∂w2∂z2=δ2H1+m1λw2=∂z2∂J∂b2∂z2=δ2
对于W1和b1也一样
∂z1∂J=∂z2∂J∂H1∂z2∂z1∂H1∂w1∂J=∂z1∂J∂w1∂z1∂b1∂J=∂z1∂J∂b1∂z1=δ2w2 relu’t (z1)=δ1=δ1x+m1λw1=δ1
首先注意一点,一个标量对一个矩阵求导,其维度不变
∂w3∂J=∂z3∂J∂w3∂z3=δ3H2
import numpy as np
def backward_propagation(X, Y, Weight, bias, H, activation, ):
m = X.shape[1]
gradients = {}
L = len(Weight)
gradients['dZ'+str(L)] = H['H'+str(L)] - Y
gradients['dW' + str(L)] = 1./m * np.dot(gradients['dZ'+str(L)],H['H'+str(L-1)].T) + 1./m* lambd * Weight['W']
gradients['db' + str(L)] = 1./m * np.dot(gradients['dZ'+str(L)],axis = 1,keepdims = True)
for l in range(L-1,0,-1):
gradients['dH' + str(l)] = np.dot(Weight['W'+str(l+1)].T,gradients['dZ'+str(l+1)])
if activation[l-1] == 'relu':
gradients['dZ'+str(l)] = np.multiarray(gradients['dH' + str(l)],np.int64(H['H'+str(1)]>0))
elif activation[l-1] == 'tanh':
gradients['dZ' + str(l)] = np.multiarray(gradients['dH' + str(l)], 1-np.power(H['H'+str(1)],2))
gradients['dW' + str(l)] = 1. / m * np.dot(gradients['dZ' + str(L)], H['H' + str(L - 1)].T) + 1. / m * lambd * \
Weight['W']
gradients['db' + str(l)] = 1. / m * np.dot(gradients['dZ' + str(L)], axis=1, keepdims=True)
return gradients
def updata_parameters(Weight,bias,gradients ,lr = 0.1):
for i in range(1,len(Weight)+1):
Weight['W'+str(i)] -= lr*gradients['dW'+str(i)]
bias['b'+str(i)] -= lr * gradients['db'+str(i)]
return Weight,bias