CS231n Lecture4-Introduction to Neural Networks

1. Gradient

  • We need to observe the extent of the data changes, then the derivative appeared, but some function contains a number of variables, there is this derivative for each variable, that is, partial derivatives. (illustration)

  • Gradient vector is partial derivative, and may be in the form of vectors, stores a plurality of partial derivatives of variables, together known as gradient. (illustration)

  • For some special gradient function, note, for example Max, it is one of large data for the partial derivative is 1, and then multiplying the value coming back, if a small one, multiplied by the partial derivatives from subsequent 0 value, is the natural result of a 0.

2. Chain rule

  • Chain rule: chain rule tells us that the right way to put together these gradients expressions of multiplication (illustration)

3. Backpropagation

  • Straightforward, the front propagation First, we then can spread reverse, the first step is a gradient of received came from behind, the second step is a gradient on the computing nodes themselves, multiplying both the partial derivative is obtained on the current node.
  • An example illustration to explain.

4. Patterns in backward flow

  • add gate: rear adder came gradient value, the current value is directly multiplied by 1, and the results, i.e., after the key to increase, the value coming back from the value of the door forward pass unchanged.
  • max gate: This is rather special, gradient value to one of the variable nodes 1, gradient 0 smaller one, the final result is then multiplied by the value coming back, to give a final forward pass gradient, i.e. the gradient of the node.
  • multiply gate: This is easier to understand, the gradient of the current node are like the normal function of the partial derivatives to find, and then multiplying the value coming back, respectively, they give respective final gradient.

5. Gradients for vectorized operations

  • simple case
# forward pass
W = np.random.randn(5, 10)
X = np.random.randn(10, 3)
D = W.dot(X)

# now suppose we had the gradient on D from above in the circuit
dD = np.random.randn(*D.shape) # same shape as D
dW = dD.dot(X.T) #.T gives the transpose of the matrix
dX = W.T.dot(dD)

Guess you like

Origin www.cnblogs.com/tsruixi/p/12601510.html