反向传播 (BP) 算法公式推导与python (numpy) 实现

参数定义:

W^{l}_{jk}:  第 (l - 1) 层第 k 个节点与第 l 层第 j 个节点的权重;

b^l_j: 第 层 第 j 个节点的偏置;

Z^l_j: 第 层 第 j 个节点的输入;

a^l_j: 第 层 第 j 个节点的输出;
C: 代价函数;

\delta^l_j: 第 层 第 j 个节点产生的错误;

其中:

Z^l_j = \sum_k W^l_{jk} \cdot a^{l-1}_k + b^l_j 

a^l_j = \sigma (Z^l_j)

若损失函数为均方差函数,则  C = \frac{1}{2} \sum_j (y_j - a^L_j)^2


公式推导

1. 第 层 第 j 个节点产生的错误​​​:

\delta^l_j = \frac{\partial C}{\partial Z^l_j} = \sum_k\frac{\partial C}{\partial Z^{l+1}_k} \cdot \frac{\partial Z^{l+1}_k}{\partial Z^l_j}

    = \sum_k \delta^{l+1}_j \cdot \frac{(\sum_j W^{l+1}_{kj}\cdot a^l_j + b^{l+1}_k)}{\partial Z^l_j}

   = \sum_k \delta^{l+1}_j \cdot \frac{(\sum_j W^{l+1}_{kj}\cdot \sigma(Z^l_j) + b^{l+1}_k)}{\partial Z^l_j}

   = \sum_k \delta^{l+1}_j \cdot W^{l+1}_{kj} \cdot \sigma'(Z^l_j)

2. 权重梯度:

\Delta W = \frac{\partial C}{\partial W^l_{jk}} = \frac{\partial C}{\partial Z^l_j} \cdot \frac{\partial Z^l_j}{\partial W^l_j} = \delta^l_j \cdot \frac{\partial (\sum_k W^l_{jk}\cdot a^{l-1}_k + b^l_j)}{\partial W^l_{jk}} = \delta^l_j \cdot a^{l-1}_k

3. 偏置梯度:

\Delta b = \frac{\partial C}{\partial b^l_j} = \frac {\partial C}{\partial Z^l_j} \cdot \frac{\partial Z^l_j}{\partial b^l_j} = \delta ^l_j \cdot \frac{\partial(\sum_k W^j_{jk} \cdot a^{l-1}_k + b^l_j)}{\partial b^l_j} = \delta^l_j

4. 权重与偏置更新:

W = W - \alpha \cdot \Delta W

b = b - \alpha \cdot \Delta b

其中 \alpha 是 learning rate.

python 代码:

import numpy as np

class NeuralNetwork(object):
    def __init__(self, input_nodes, hidden_nodes, output_nodes, learning_rate):
        self.input_nodes = input_nodes
        self.hidden_nodes = hidden_nodes
        self.output_nodes = output_nodes
        self.weights_input_to_hidden = np.random.normal(0.0, self.hidden_nodes ** -0.5,
                                                        (self.hidden_nodes, self.input_nodes))

        self.weights_hidden_to_output = np.random.normal(0.0, self.output_nodes ** -0.5,
                                                         (self.output_nodes, self.hidden_nodes))
        self.lr = learning_rate
        self.activation_function = lambda x: 1 / (1 + np.exp(-x))

    def train(self, inputs_list, targets_list):
        inputs = np.array(inputs_list, ndmin=2).T
        targets = np.array(targets_list, ndmin=2).T

        hidden_inputs = np.dot(self.weights_input_to_hidden, inputs)
        hidden_outputs = self.activation_function(hidden_inputs)

        final_inputs = np.dot(self.weights_hidden_to_output, hidden_outputs)
        final_outputs = final_inputs

        output_errors = (targets - final_outputs)
        hidden_errors = np.dot(output_errors, self.weights_hidden_to_output) \
                        * (hidden_outputs * (1 - hidden_outputs)).T

        self.weights_hidden_to_output += output_errors * hidden_outputs.T * self.lr
        self.weights_input_to_hidden += (inputs * hidden_errors * self.lr).T

    def predict(self, inputs_list):
        inputs = np.array(inputs_list, ndmin=2).T

        hidden_inputs = np.dot(self.weights_input_to_hidden, inputs)
        hidden_outputs = self.activation_function(hidden_inputs)

        final_inputs = np.dot(self.weights_hidden_to_output, hidden_outputs)
        final_outputs = final_inputs

        return final_outputs

猜你喜欢

转载自blog.csdn.net/francislucien2017/article/details/86762178