Deep Learning Theory (Seventeen) -- ResNet's deep classic

Scientific knowledge

    ICML is the abbreviation of International Conference on Machine Learning, the International Conference on Machine Learning. ICML has now developed into an annual top international conference on machine learning hosted by the International Machine Learning Society (IMLS).

6d12553e06a25493fd3a17a9190c2924.png

# Preface

SEP.

In the last deep learning theory article, we learned a deep representative network-GoogLeNet at one time, which built a deeper network through the proposed Inception module. Today, we will walk into the most representative network in the deep network of deep learning - the ResNet network. How is it different from the previous network structure? And why is the network proposed? Please see the breakdown below.

eb973608ac37451b7e9a7c0116e47b72.png

ResNet network

d40b0342bd1fe9ba9f66027df87e5c8b.png

The title of this shared paper is: Deep Residual Learning for Image Recognition, the title translated means: deep residual learning for image recognition. As soon as the network was proposed, it completely refreshed the cognition of deep network in the field of computer vision, and then derived multiple variants of residual network in many fields, and its far-reaching influence has continued to this day. There is a kind of exaggerated It is said that in the field of image recognition, only the residual network is known, but not other networks. Although it is a bit exaggerated, it reflects that the residual network is popular among researchers.

Screenshot of the paper:

672f3a89a25930cd90da45dc78c097fe.png

ad3af091a10a38a85c52e6eee7b6910b.png

Paper address: https://arxiv.org/pdf/1512.03385.pdf

1. The reason for the residual network

fb416c65a81fc4e5838a73f982c065a3.png

a2b077fb27e2eaec706d64eeca4ab825.png

dfae70fd2198b033702b1c52d317a609.png

The above two pictures show the derivation process of the previous backpropagation and gradient descent algorithm. For detailed articles, see: Deep Learning Theory (5) -- Mathematical Derivation of Gradient Descent AlgorithmDeep Learning Theory (7) -- Reverse spread

For the reason for the proposal of the residual network, the author proposed at the beginning of the abstract, that is, the deeper the neural network training is, the more difficult it is, but why? Didn't we say that the deeper the network extracts more features, the deeper the layer, the more information it represents? Yes, generally speaking, this is indeed the case, but there is a limit to this sentence, that is, within the scope of a certain number of network layers, how big is this scope? Generally speaking, it can be compared to the previous VGG network and GoogLeNet network. These two networks are deep enough. If they go deeper, the network may not be able to be trained. Why? This needs to be explained from the perspective of gradient. Remember what we shared in our original deep learning article, the update of network parameters depends on the backpropagation algorithm, and the backpropagation algorithm usually uses the gradient descent algorithm? What is the relationship between the depth of the neural network and the gradient descent algorithm? We know that the gradient descent algorithm is to derive the derivative of the chain rule in the entire network, and the problem arises here. The network depth is within a certain range, and the chain rule is completely fine. The more multiplication items, and these gradient values ​​(even multiplication items) are floating point numbers in many cases, more and more multiplications will cause the final gradient value to be very small, so that the final gradient is 0, which is also The gradient disappears. When the gradient disappears, according to the gradient descent formula, the last gradient is 0, and the current gradient value remains unchanged. Therefore, the network parameters are no longer updated, and the next step of training cannot be performed. . Therefore, as the depth of the network deepens, the problem of gradient disappearance occurs. Therefore, the deeper the network training is, the more difficult it is, which can usually be explained as the reason proposed by the residual network .

2. Network structure

743ae372470b9861ed164ba30d0f8208.png

3232e4ce0b4ac29da387ced879a53e99.png

The picture above shows the fast residual learning proposed in the article, which is also the basic module of the residual network. After careful observation, is there any difference from the previous network? In fact, it is very simple, that is, there is an extra jump connection, which connects the input terminal directly to the output terminal. People ignore. The ordinary network has two advantages after adding one more step to the input x:

  1. The high-level information is integrated with the low-level information to make the feature expression more abundant.

  2. Due to the appearance of the input x, when performing backpropagation, there will always be one more derivative of the current x in the gradient descent algorithm, which makes the gradient never appear very small, and solves the gradient The problem disappears, and deeper networks can be trained.

The network structure configuration in the paper: from 18 layers to 152 layers

0b373ac90729fcb9f7b1cb0263158864.png

One of the examples: 32-layer residual network structure

3dde896c59fbb065c33badb3421af5f3.png

Since the basic residual block is relatively simple, we will not explain the structure of each layer in detail. We will explain it in detail when we share it in actual combat.

91da35d14575409b29a771267bf419ce.gif

END

epilogue

This is the end of the sharing of this issue. The emergence of the residual network has led the process of deep learning, which also guides us to a certain extent. A better solution is not more difficult. Maybe a small change will appear For greater improvement, we need to start from the basic principles, so that we can go further.

See you in the next issue!

Editor: Layman Yueyi|Review: Layman Xiaoquanquan

9df672740cfb1e3d59fa2fe8d7e52094.png

Advanced IT Tour

Past review

Deep Learning Theory (16) -- GoogLeNet's Re-exploration of the Mystery of Depth

Deep Learning Theory (15) -- VGG's initial exploration of the mystery of depth

Deep Learning Theory (14) -- AlexNet's next level

What have we done in the past year:

[Year-end Summary] Saying goodbye to the old and welcoming the new, 2020, let's start again

[Year-end summary] 2021, bid farewell to the old and welcome the new

71868f130e9ccfc75550fe0b93bc6b01.gif

Click "Like" and let's go~

Guess you like

Origin blog.csdn.net/xyl666666/article/details/121896561