Deep Learning Theory (13) -- LetNet-5 is surging

Scientific knowledge

For deep learning or machine learning models, we not only require it to have a good fit (training error) to the training data set, but also hope that it can have a good fit result to the unknown data set (test set) ( generalization ability), the resulting test error is called generalization error. The most intuitive manifestation of measuring the generalization ability is the overfitting and underfitting of the model.

review

I still remember that a recent article in the Deep Learning Theory article shared the pooling of dimensionality reduction. Since then, we have started to share the deep learning practice of TensorFlow. According to the previous progress, after sharing the basics of convolutional neural networks after pooling The module is over, so let everyone go into actual combat to understand the construction and training of the entire neural network, so that everyone can have an overall outline from the foundation to the whole, from the bottom to the top. Starting today, we officially return to the sharing of theoretical articles. We also said before that the theory of deep learning has been developing. Therefore, the theoretical and practical articles may never end. After sharing the basic modules, we first start from the convolutional neural network. Some classic papers on the Internet began to be shared, and then went to actual combat operations. I hope you will have a worthwhile trip.

1. LetNet-5

In this article, we share the LetNet-5 network, the pioneering work of the classic convolutional neural network. This neural network architecture was proposed by LeCun in 1998 to recognize handwritten digits. At that time, it is estimated that when few people knew that there was such a thing as deep learning, people had already proposed it, and the effect was very good compared with traditional methods. Since then, various great gods have emerged, and various convolutional neural networks have surged. , Gradually evolved into today's status quo where everything is AI.

LetNet-5  Thesis Title: Gradient-Based Learning Applied to Document Recognition.

Put a few paper pictures:

1. LetNet-5 network structure diagram

The above is the network structure diagram of LetNet-5. From the figure, we can see that the input is a handwritten English letter A, and then enter the convolution layer-down sampling-convolution layer-down sampling-full connection layer- The fully connected layer finally outputs the probability that the input picture belongs to each number, and the index value of the maximum probability value is taken as the final predicted value during the actual test.

The final recognition effect diagram is as follows:

Network analysis

1. Input layer: an original image with a shape of 1*32*32, representing a grayscale image (single channel), with an image size of 32*32.

2. Convolution layer 1:

Input: 1*32*32

Convolution kernel size: 5*5

Number of convolution kernels: 6

Step size: 1 by default

Output feature map size: 32-5+1 =28, ie 28*28

Output feature map shape: 6*28*28 represents 6 feature maps of 28*28. Lesson 6 understands that the feature map of size 28*28 has 6 channels.

3. Downsampling layer 1:

     Sampling method: average pooling

Input: 6*28*28

Sampling area: 2*2

Step size: 1 by default

Output feature map size: 28/2 = 14, ie 14*14

Output feature map shape: 6*14*14

4. Convolution layer 2:

Input: 6*14*14

Convolution kernel size: 5*5

Number of convolution kernels: 16

Step size: 1

Output feature map size: 14-5+1=10, that is, 10*10

Output feature map shape: 16*10*10

5. Downsampling 2:

Sampling method: average pooling

Input: 16*10*10

Sampling area: 2*2

Step size: 1 by default

Output feature map size: 10/2 = 5, ie 5*5

Output feature map shape: 16*5*5

6. Convolution layer 3:

Input: 16*5*5

Convolution kernel size: 5*5

Number of convolution kernels: 120

Step size: 1

Output feature map size: 5-5+1=1, ie 1*1

Output feature map shape: 120*1*1

7. Fully connected layer 1:

Input: 120*1*1 is equivalent to 120

Output neurons: 84

Output feature map shape: 84

8. Fully connected layer 2:

Input: 84

Output neurons: 10

Output feature map shape: 10

At this point, an image becomes a vector with a length of 10 after passing through the LetNet-5 network. During training, these 10 values ​​will be passed into the loss function to calculate the current loss size, and then backpropagation will be performed.

In the actual test, the vector of these ten values ​​will be passed through a softmax function to obtain the probability of each value. The larger the value, the greater the probability, and the sum of the probabilities is 1.

Summarize

LetNet-5 is the first true convolutional neural network. It is the originator of the current neural network. Due to the slow development of computer technology at that time, the influence of the parameter size was considered at the beginning of the involvement, so the entire network has It is very concise, has a small amount of parameters, and has the advantages of fast training speed.

epilogue

This is the end of this sharing. Today’s knowledge is very simple. It mainly analyzes the structure of the network and the dimension size of each layer. 5 Recognition of handwritten digits in Internet terms, so stay tuned, veterans who have enough energy to learn can try it in advance.

Have a great weekend and see you next time!

What have we done in the past time:

  1. [Year-end summary] 2021, bid farewell to the old and welcome the new

  2. [Year-end Summary] Saying goodbye to the old and welcoming the new, 2020, let's start again

Editor: Layman Yueyi|Review: Layman Xiaoquanquan

Past recommendation

01

Deep Learning Theory Part (12) -- Pooling of Dimensionality Reduction

02

Deep Learning Theory (11) -- The Flourishing Age of Convolutional Neural Networks (3)

03

Deep Learning Theory (10) -- The Flourishing Age of Convolutional Neural Networks (2)

Scan code to follow us

Advanced IT Tour

praise me when i see you

Guess you like

Origin blog.csdn.net/xyl666666/article/details/118886361