Neural Network (2): Back Propagation Step (BP)

Basic concepts of BP neural network:

1. Cost function: The training process of the neural network is to fit the optimal parameter (weight) by minimizing the cost function (J)

[The cost function of the neural network is actually an indicator, indicating that the degree of fitting of the model to the sample can be compared to the variance of the difference between the predicted value h (x) of the model and the sample output yi (to be determined) ]

[Minimization of cost function can be achieved by gradient descent method or back propagation algorithm. Analyze the pros and cons of both

[The cost function of the neural network is a non-convex function, which means that the optimization algorithm may fall into the local optimal solution]

[How to understand the regular term in the cost function; you can compare the regular term function in the ogistic regression formula (to avoid overfitting the parameters)]

2. Backpropagation process description:

i) Take the neuron value and weight value obtained from the forward propagation, and the output value of the last layer (the error between the predicted value hi and the sample value yi as input,

ii) Calculate the error and cumulative error of each neuron in the hidden layer ; [ Related formula calculation]

iii) The output is the partial derivative of the cost function J (theta) to the weight theta_i .

        Wu Enda Error Calculation in the Process of Back propagation of Machine Learning

          Wu Enda machine learning back propagation algorithm pseudocode

3. Vector expansion: transformation of parameter vector / matrix during neural network training. [When input to the optimization algorithm, the parameters are converted into vectors; when used as the forward / backward propagation input, the parameters are in matrix form]

        Matlab implementation of parameter vector and matrix conversion

           Application position of vector transformation in algorithm

4. Gradient test: because the result during the back-propagation process may seem reasonable [reasonable means that the cost function becomes smaller during the optimization process], but it is actually buggy, resulting in a buggy BP algorithm and no buggy The BP algorithm error may be relatively large. Therefore, a gradient test step is required. The gradient algorithm has the same function as the BP algorithm. Both calculate the partial derivative of the cost function J (theta), but the error is not easy to occur during the gradient method, but the operation is slow; the BP algorithm is easy to implement An error occurs, but the operation is fast; so when programming the BP algorithm, the partial derivative calculated by the gradient algorithm is compared with the derivative value obtained by the BP algorithm to see whether it is basically equal (the difference is only a few decimals) [so-called gradient test] . After confirming that there is no error in the BP program, it is used to train the neural network [the BP algorithm is used during the training process, and the gradient algorithm is not used]

        Gradient method: numerical approximation of the derivative of a point

          Gradient method matlab formula

        Partial derivative of theta in vector form

          Opinions about gradient test 

5. Random initialization: The common initialization is to initialize all parameters to 0, but zero initialization in the neural network algorithm will result in: all parameters transmitted from the upper layer to the first neuron in the next layer and the upper layer are transmitted to All parameters of the second neuron in the next layer are exactly the same, resulting in the same value for each neuron in a single layer, resulting in feature redundancy. And the error of each neuron in the same layer in the back propagation process is also the same, and thus the partial derivative is also the same.

[A single neuron can be regarded as a complex feature of the new structure]

     If zero is initialized, the weight parameters of the same color system are equal

    Randomly initialized matlab implementation [epsilon can take 10 ^ -4]

6. Neural network framework construction: the number of input layer units is determined by the number of features of the sample; the number of hidden layers is generally one layer, which can also be 2, 3, 4 and so on. The greater the amount; the number of single-layer units in the hidden layer is generally a multiple of the input layer unit, which can be 1, 2, 3, etc .; the more units, the more reconstruction features, the more complex nonlinear Function, but the computational cost is greater.

      NN common framework

7. Steps of training a neural network

① Random initialization weight coefficient theta

②The forward propagation algorithm calculates h (xi) and each unit parameter value a i (l) ;

③ Calculate the error δ (final_l) of the last layer of the error , and then use the back propagation algorithm to calculate the error δ i (l) of the unit of each layer

④The formula finds the error of each layer Δ = Δ '+ δ * (a) T and the partial derivative of the cost function, and outputs Dvec [differential vector]

⑤Gradient algorithm checks the accuracy of BP algorithm

⑥Combining forward propagation and back propagation, the optimization algorithm is used to solve the minimum value of the cost function.

output of forward_propagation

output of back_propagation

output of optimization algorithm

[For the first time use a simple for loop]

[Optimization algorithm can be gradient descent method or other more advanced optimization algorithms]

[ About the gradient descent method and find the derivative before gradient method and backpropagation relationship : the role and the backpropagation gradient is calculated from the partial derivatives of the gradient descent method step is required, in other words the minimum cost function The optimization is solved by the steps of the optimization algorithm, and the partial derivatives in the steps are calculated by the gradient method or the back propagation algorithm.

 

Reference materials:

1.https: //www.bilibili.com/video/BV164411S78V? P = 56, Wu Enda machine learning video P43-P56 at station B

 

Guess you like

Origin www.cnblogs.com/feynmania/p/12746565.html