3D Convolution & Hole Convolution

Three-dimensional convolution

1. General structure

The following is the 3D convolution, the filter depth is smaller than the input layer depth (core size <channel size). Therefore, the 3D filter can be moved in all three directions (image height, width, channel) . At each position, element-wise multiplication and addition provide a value. Because the filter slides through a 3D space, the output values ​​are also arranged in 3D space. In other words, the output is a 3D data.
Insert picture description here

2. 2D and 3D comparison

2D is often used in the fields of computer vision and image processing:
2D convolution The 2D convolution operation is shown in Figure 1. To make the explanation clearer, single-channel and multi-channel operations are shown separately. And for the convenience of drawing, it is assumed that there is only one filter, that is, the output image has only one chanel.

Among them, for a single channel , the channel of the input image is 1, that is, the input size is (1, height, weight), the size of the convolution kernel is (1, k_h, k_w), and the spatial dimension of the convolution kernel on the input image ( That is, the sliding window operation is performed on (height, width) two dimensions), and each time the sliding window and the values ​​in the (k_h, k_w) window are convolved (now replaced by related operations), a value in the output image is obtained .

For multi-channel , assume that the channel of the input image is 3, that is, the input size is (3, height, weight), the size of the convolution kernel is (3, k_h, k_w), and the spatial dimension of the convolution kernel on the input image (ie The sliding window operation is performed on (height, width) two-dimensional). Each time the sliding window performs related operations with all the values ​​in the (k_h, k_w) windows on the 3 channels, a value in the output image is obtained.

Insert picture description here
3D: Three-dimensional convolution commonly used in video processing (detecting actions and character behavior).

3D convolution The 3D convolution operation is shown in Figure 2. It is also divided into single channel and multi-channel, and only one filter is used to output one channel.

Among them, for a single channel , the difference from 2D convolution is that the input image has an additional depth dimension, so the input size is (1, depth, height, width), and the convolution kernel also has an additional k_d dimension, so the convolution The kernel performs sliding window operations on the spatial dimensions (height and width dimensions) and depth dimensions of the input 3D image. Each sliding window performs related operations with the values ​​in the (k_d, k_h, k_w) window to obtain the output 3D image A value.

For multi-channels , the input size is (3, depth, height, width), and the operation is the same as 2D convolution. Each sliding window is correlated with all values ​​in the (k_d, k_h, k_w) window on 3 channels Operate to get a value in the output 3D image.
Insert picture description here
Reference:
https://blog.csdn.net/weixin_36836622/article/details/90355877
https://www.zhihu.com/question/266352189

Hole convolution

Insert picture description here

1. Introduce hole convolution

The purpose of this structure is to provide a larger receptive field without pooling (pooling layer will cause information loss) and the amount of calculation is equivalent. By the way, the main problems of the convolution structure are as follows:

  • Pooling layer cannot be learned
  • Loss of internal data structure; loss of spatial hierarchical information.
  • Small object information cannot be reconstructed (assuming that there are four pooling layers, any object information smaller than 2^4 = 16 pixel will not be reconstructed theoretically.)

The hollow convolution has the characteristics of retaining the internal data structure and avoiding the use of down-sampling, which has obvious advantages.

2. Structure

Feel the size of the area mapped in the original image by the pixels on the feature map of each layer of the field product neural network, which is equivalent to how large the area of ​​the original image is affected by the pixels in the high-level feature map!
The larger the receptive field-each convolution output contains a larger range of information

This is very crucial for visual tasks! Therefore, when designing the neural network, we must fully consider the problem of the receptive field, so that the model will be more accurate and robust~.

The dilation rate is mainly used to indicate the size of the dilation. This parameter defines the distance between the values ​​when the convolution kernel processes the data.

Insert picture description here
kernels are all 3*3

  • Figure (a) is 1-dilated conv, the receptive field is 3×3
  • Figure (b) shows the 2-dilated conv, following the 1-dilated conv, the receptive field expands to 7×7
  • Picture © is 4-dilated conv, which also follows 1-dilated conv and 2-dilated conv, the receptive field is expanded to 15×15
  • In contrast, using ordinary convolution with stride of 1, the receptive field after three layers is only 7×7

3. The role of hollow convolution

(1) Expand the receptive field

In deep net, in order to increase the receptive field and reduce the amount of calculation, down-sampling (pooling or s2/conv) is always performed, so that although the receptive field can be increased, the spatial resolution is reduced. **In order not to lose the resolution and still enlarge the receptive field, you can use hole convolution. **This is very useful in detection and segmentation tasks. On the one hand, the sensing field can detect and segment large targets, and on the other hand, the resolution can accurately locate the target.

(2) Capturing multi-scale context information

Hole convolution has a parameter to set the dilation rate. The specific meaning is to fill the dilation rate -1 0 in the convolution kernel. Therefore, when different dilation rates are set, the receptive field will be different, that is, multi-scale information is obtained. .

4. Question:

(1) Gridding Effect

Although the expansion convolution increases the receptive field without losing the size of the feature map, it also brings new problems, mainly reflected in the input of the convolution, because the convolution kernel is spaced, which means that it is not All inputs are involved in the calculation, and the overall feature map shows a discontinuity of the convolution center point, especially when the superimposed convolution layers use the same dilation rate: the
Insert picture description here
example in the above figure is three dilation rates =2 The result of continuous convolution of the expanded convolutional layer. The blue mark is the convolution center involved in the calculation, and the number of color depth representations. It can be seen that since the dilation rate of the 3 times is the same, the calculation center of the convolution will show a grid-like outward expansion, and some points will not become the center point of the calculation.

(2)

The use of large dilation rate information may only be effective for segmentation of some large objects, while it may be harmful to small objects.

5. Solve

The most straightforward way to solve this problem is of course not to use the same extended convolution with continuous dilation rate, but this is not enough, because if the dilation rate is multiple, then the problem still exists. So the best way is to set the dilation rate of the continuously arranged dilated convolutions to "sawtooth ", such as [1, 2, 3],

A simple example: dilation rate [1, 2, 5] with 3 x 3 kernel (feasible solution):
Insert picture description here
The nature of such a sawtooth itself is better to meet the segmentation requirements of small objects and large objects at the same time (small dilation Rate is concerned with short-distance information, and large dilation rate is concerned with long-distance information).

Hollow convolution is generally applied to semantic segmentation~

Reference:
https://blog.csdn.net/chaipp0607/article/details/99671483
Receptive Field Calculation
https://www.jianshu.com/p/f743bd9041b3
https://www.cnblogs.com/hellcat/p/9687624 .html

Guess you like

Origin blog.csdn.net/weixin_45019830/article/details/107454022