Detailed explanation of DPN (Dual Path Network) network structure

Paper: Dual Path Networks
Paper link: https://arxiv.org/abs/1707.01629
Code: https://github.com/cypw/DPNs
DPN code for trainable models under MXNet framework: https://github.com/ miraclewkf/DPN

We know that ResNet, ResNeXt, DenseNet and other networks have obvious effects in the field of image classification, and DPN can be said to integrate the core ideas of ResNeXt and DenseNet: Dual Path Network (DPN) uses ResNet as the main framework to ensure low redundancy of features Degree, and a very small DenseNet branch is added to it to generate new features.

So what are the advantages of DPN? You can see the following two points:
1. Regarding the complexity of the model , the author's original text said: The DPN-92 costs about 15% fewer parameters than ResNeXt-101 (32 4d), while the DPN-98 costs about 26% fewer parameters than ResNeXt-101 (64 4d).
2. Regarding the computational complexity , the author's original text said: DPN-92 consumes about 19% less FLOPs than ResNeXt-101(32 4d), and the DPN-98 consumes about 25% less FLOPs than ResNeXt-101(64 4d).
insert image description here

Put the network structure Table1 (above) first, and have an intuitive impression.

In fact, the structure of DPN and ResNeXt (ResNet) is very similar. At the beginning, a 7*7 convolutional layer and max pooling layer, then 4 stages, each stage contains several sub-stages (described later), followed by a global average pooling and full connection layer, and finally softmax layer. The focus is on the content in the stage, which is also the core of the DPN algorithm.

Because the DPN algorithm is simply to integrate ResNeXt and DenseNet into a network, so before introducing the structure in each stage of DPN, let’s briefly go over ResNet (the substructure of ResNeXt and ResNet is the same macroscopically) and DenseNet. core content.

(a) in the figure below is part of a stage of ResNet. The large rectangular box on the left side of (a) indicates the input and output content. For an input x, it is divided into two lines. One line is x itself, and the other line is x after 1×1 convolution and 3×3 convolution. , 1×1 convolution (the combination of these three convolutional layers is also called bottleneck), and then make an element-wise addition of the output of these two lines, that is, add the corresponding values, which is the addition in (a) No., the obtained result becomes the input of the next same module, and several such modules are combined to form a stage (such as conv3 in Table1).

(b) represents the core content of DenseNet. The vertical polygonal box on the left side of (b) represents the input and output content. For the input x, only one line is taken, that is, after several layers of convolution and a channel merge (cancat) with x, the obtained result becomes the following The input of a small module, so that the input of each small module is continuously accumulated, for example: the input of the second small module includes the output of the first small module and the input of the first small module, and so on.
insert image description here
How does DPN do it? Simply put, it is the fusion of Residual Network and Densely Connected Network. (d) and (e) in the figure below have the same meaning, so let’s talk about (e). The vertical rectangles and polygons in (e) have the same meaning as before. Specifically in the code, for an input x (in two cases: one is if x is the output of the first convolutional layer of the entire network or the output of a certain stage, a convolution will be performed on x, and then sliced, That is, the output is divided into two parts according to the channel: data_o1 and data_o2, which can be understood as the vertical rectangular frame and polygonal frame in (e); the other is the output of a sub-stage inside the stage, and the output itself contains Two parts: data_o1 and data_o2), take two lines, one line is to keep data_o1 and data_o2 itself, similar to ResNet; the other line is to do 1×1 convolution, 3×3 convolution, 1×1 convolution on x , and then do slice to get two parts c1 and c2, and finally add c1 and data_o1 (element-wise addition) to get sum, similar to the operation in ResNet; c2 and data_o2 do channel merge (concat) to get dense (such that the next layer You can get the output of this layer and the input of this layer), that is, finally return two values: sum and dense.
The above process is a sub-stage in a stage in DPN. There are two details, one is that the 3×3 convolution uses a group operation, similar to ResNeXt, and the other is that a channel widening operation will be performed on the dense part at the beginning and end of each sub-stage.
insert image description here
The author implemented the DPN algorithm under the MXNet framework. The specific symbol can be seen at: https://github.com/cypw/DPNs/tree/master/settings, which is very detailed and easy to understand.

Experimental results:
Table2 is a comparison between the ImageNet-1k dataset and the best algorithms at present: ResNet, ResNeXt, DenseNet. It can be seen that the DPN network is better in terms of model size, GFLOP and accuracy. However, in this comparison, it seems that DenseNet's performance is not as gratifying as that introduced in the DenseNet paper, probably because DenseNet requires more training skills.
insert image description here
Figure 3 is a comparison of training speed and storage space. Now for the improvement of the model, it may be difficult to improve the accuracy rate as an obvious innovation point, because the range is not large, so most of them are still optimized in terms of model size and computational complexity. At the same time, as long as the accuracy rate can be improved a little progress.
insert image description here

Summary:
The DPN network proposed by the author can be understood as introducing the core content of DenseNet on the basis of ResNeXt, making the model more fully utilize the features. The principle is not difficult to understand, and it is easier to train in the process of running the code. At the same time, the experiments in the article also show that the model has good results on the classification and detection data sets.

Guess you like

Origin blog.csdn.net/weixin_44025103/article/details/132012194