[Wu Enda CNN] week1: Convolution and Pooling

Wu Enda's first week of deep learning CNN

1. Convolution

As people find in fully connected networks or in the field of computer vision applications, the huge amount of image data leads to a sharp increase in the number of parameters, which brings great problems to the training of the model.
The convolution operation itself can play a role in reducing the data size.

1. Filter (filter/kernel) and convolution (convolution)

  • Input: a 6x6 grayscale image (so the size of the image is 6x6x1)
    Insert picture description here
  • Given a convolution kernel (or called a filter)-a 3x3 square matrix

Different values ​​can be filled in as needed

Insert picture description here

  • Calculation process and output
    Insert picture description here
    Insert picture description here
    Move the convolution kernel one unit at a time in the order from left to right and top to bottom, and then repeat the operation of "multiplying and adding corresponding elements" to get the target matrix.

[ Edge detection ]
When introducing the 3x3 kernel above, we mentioned that the data (and the dimensions of the kernel) can be set according to actual problems; then under the setting of the above example, it is actually for detection Vertical edges (vertical edges that transition from light to dark)

  • From the perspective of matrix operations
    Insert picture description here
  • From the perspective of image visualization,
    Insert picture description here
    " So, we can understand that convolution of the original matrix and the kernel matrix is ​​essentially trying to find the local image pattern shown by the kernel visualization in the original image. In the result matrix, it is extremely bright. The area indicates the range of the specific feature mode.

    In the follow-up, we will continue to learn if all 9 numbers in a 3x3 convolution kernel are used as parameters, gradient descent and backpropagation are used to find the best The excellent convolution kernel.

2. Padding

(1) Problems with
Insert picture description here
unpadding convolution operation ①The image output by the convolution operation will be reduced

When the number of neural network layers is deep, the feature of "image reduction" will greatly affect the training effect.

②There will be loss and omission of information at the edge of the image

As shown in the figure above, the green black box participates only once in the entire convolution operation; but the red black box participates many times.

(2) Solution-padding
Insert picture description here
According to the calculation formula, it can be known that a layer of padding needs to be performed on the outside of the original image to ensure that the output image has the same dimension as the original image.
W ′ = floor ((W − F + 2 P) / S + 1) W'= floor((W-F+2P)/S+1)W=floor((WF+2P)/S+1 )
Under normal circumstances, we are used to filling with 0.

At this point, the green black box can already affect the four pixels of the output result (that is, participating in four convolution operations), and the shortcoming of "edge information is lost" is weakened by the padding operation.

(3) Selection of Filling Pixels
valid convolution and Same Convolution
Insert picture description here
In the Same padding strategy, according to the relationship of [p = (f-1)/2], it is not difficult to think of why we usually use odd-dimensional convolution kernel kernels.

3. Strided Convolution

(1) The convolution operation with set step size
also corresponds to the convolution kernel and the input image, and then performs the operation of "multiplying and summing the corresponding position elements"; but every time the convolution kernel is turned to the right ( Or down) when moving, it is no longer to move one pixel according to the default value of 1, but to move s pixels according to the setting of stride.
Insert picture description here
(2) The dimensionality change formula of the step-size convolution operation.
Insert picture description here
The so-called "round down" operation means that when the convolution kernel cannot be completely inside the input data, the convolution operation will not be performed.
That is: your filter must be completely in the image or in the image area after filling in order to output the corresponding result.
Insert picture description here

[Notice] Regarding convolution and cross-correlation
under strict mathematical definitions, the operations we have mentioned above can only be called "cross-correlation", but in the field of deep learning, we usually default that the above operations are so-called "volumes". Therefore, in the next course, we will say that the convolution operation on the image is the related operation we talked about above.

The actual "convolution" operation is as follows:
Insert picture description here

4. Three-dimensional convolution

(1) The operation process of three-dimensional convolution

The most common example of three-dimensional convolution is to perform convolution operations on RGB three-channel images, as follows:

Insert picture description here
The movement of the convolution kernel is consistent with the two-dimensional:
Insert picture description here

  • The size (length and width) of the convolution kernel can be set independently according to needs, but the depth of the convolution kernel must be consistent with the depth of the input image , as shown in the figure above, each time is for n channels at the same time Perform the "multiply and sum" operation

(2) Convolution kernel-feature extractor

In the two-dimensional convolution example, we mentioned that by designing different convolution kernels, different features can be extracted.
Insert picture description here
(3) Depth of convolution output
Insert picture description here

5. Convolutional layers in convolutional networks

(1) The forward propagation process of the
Insert picture description here
convolutional layer (2) Summary of the symbolic representation of the convolutional layer The
Insert picture description here
following points should be paid attention to:

  • The activation output of the previous layer is the input value of the next layer
  • The depth of the convolution kernel must be consistent with the depth value of the input data
  • The depth of the output data is determined by the number of selected convolution kernels

(3) An example of a simple convolutional neural network.
Insert picture description here
According to the above figure, you can find the general law of convolutional neural network operations:
as the number of operational layers increases, the size of the data (image) tends to gradually decrease, but the depth of the data tends to Will increase.


2. Pooling

In addition to convolutional layers, convolutional neural networks often use pooling layers——

  • Reduce the size of the model
  • Increase calculation speed
  • Improve the robustness of extracted features

1. Example of pooling operation: Max Pooling

Add a 4x4 input data, use the pooling parameter of f = 2, s = 2 to perform the maximum pooling operation:
intuitively, the original array is divided into four sub-matrix blocks, and each sub-matrix block is selected The maximum value is used as an element of the output.

In essence, it maintains a kernel of the size of fxf, sliding according to the step length of s each time, and the operation performed by the kernel is to take the maximum value of the submatrices of the fxf range covered by it.
Insert picture description here

If the input and output are both understood as a feature in the image, then the function of the [maximization] operation is: as long as a feature is extracted in any quadrant, it will remain in the output of the maximum pooling.

The essence of the maximization operation is: if a feature is extracted in the filter, then the maximum value is retained; if this feature is not extracted, then this feature may not exist in this part, and accordingly, the value obtained will not be very large. Big.

  • The pooling operation has proved to be very effective in practical applications. It has a set of hyperparameters, but once determined, they are fixed, so there is no need to learn in the process of backpropagation
  • The hyperparameters contained in the pooling operation and the dimensional calculation formula of the input and output data are all similar to convolution.
  • The above uses two-dimensional maximum pooling as an example to illustrate. If the input is a multi-channel (assuming depth n) data, then the output data obtained after the pooling operation is also depth n. That is, the pooling operation can be performed on each channel according to the above definition.

[Another pooling operation] Average Pooling, as the
name implies, when performing operations on each sub-area, the selection is no longer the maximum value, but the average value.
Insert picture description here

2. Summary of Pooling

Insert picture description here

  • Maximum pooling is just to calculate the static properties of a certain layer of the neural network
  • The hyperparameters of pooling do not need to be learned, they may be manually set , or they may be set through cross-validation

3. Examples of Convolutional Neural Networks

1. A complete convolutional neural network

In the following, we use the application scenario of handwritten digit recognition to construct a complete convolutional neural network and clarify the calculation process of its forward propagation.
Insert picture description here

  • Pay attention to the change in data size after convolution operation and pooling operation
  • Like the previous convolutional network, as the number of layers increases, the length and width of the data will decrease, but the depth of the data will increase
  • The bottom part of the above figure shows the composition of a general convolutional network

2. Comparison of the network structure of each layer

Insert picture description here

3. Advantages of Convolutional Neural Networks

(1) Reduce the parameters , so that we can use a smaller training set to train the network model, and can effectively prevent the occurrence of overfitting.

Convolutional neural networks mainly achieve the purpose of reducing the number of parameters through [weight sharing] and [sparse connection].

Insert picture description here
① Weight sharing
A filter corresponds to a feature extractor. If a filter is applicable to a certain area in the picture, it can be applied to any area in the picture.
Therefore, each feature detector and output can use the same parameters in different areas of the input picture .

Filter can extract not only low-level features such as edges, but also high-level features such as eyes and a certain animal.

② Sparse connection,
for example, uses a filter of size fxf, then it means that the value in a cell of the output unit is only associated with fxf units in the source data (regardless of the size of the source data itself), in the source data The other cells of the have no effect on the value of the output cell.
Insert picture description here
(2) Convolutional neural networks are good at capturing the translation unchanged . Even if the image is shifted by pixels, it can still clearly capture the characteristic patterns in the image.

In fact, we use the same filter to generate all the pixel values ​​of the pictures in each layer, hoping that the network will become more robust through automatic learning; in order to better obtain the desired translation invariant attributes.

4. Training of Convolutional Neural Networks

Insert picture description here

Guess you like

Origin blog.csdn.net/kodoshinichi/article/details/109706431