论文笔记:RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation

1 Summary

With the development of CNN, the emergence of a lot of depth deeper ResNet series such as CNN, they are well suited for dense complete segmentation tasks, such as semantic segmentation. However, due to the need to sample at CNN repeatedly carried out, this has led to constantly reduce the image resolution, easy to lose some of the spatial information of the image, so for some high-resolution image is very unfriendly. To address this problem, the authors propose a RefineNet, the introduction of residual convolution module (Residual Convolution Unit), multi-resolution fusion module (Multi-Resolution Fusion) and series residual mistake module (Chained Residual Pooling) and other structures, is very effective the spatial resolution recovery, data in focus 7 reached SOTA.

2 Highlights

At the time, DeepLab the best performing network, but the authors note that there are two flaws:
① high-resolution images, the presence of high-dimensional features, will make DeepLab consume large amounts of computing resources.
② DeepLab convolution using empty even though it can get a larger receptive field, but this will make some space high-resolution image information is lost, the image becomes rough. RefineNet proposed by the authors primarily through the use of three modules to avoid these problems.

2.1 residual convolution module (the RCU)

The authors propose the RCU module reference ResNet residual block, in the block is divided into two lines, as shown below:
Here Insert Picture Description
trunk directly input image, the first image through the branch ReLU, 3x3 convolution, ReLu, 3x3 convolution, feature fusion and then superimposed on the trunk line, the residual portion may be understood as convolution FIG characterized supplementary information such that the image information is more abundant.

More than 2.2 Resolution fusion module (MRF)

After a residual image obtained by convolution module will have to enter the MRF module, an image module is mainly MRF different scales for feature extraction and upsampled to the same resolution, the final fusion, as follows:
Here Insert Picture Description
the image at different scales have entered the corresponding channel 3x3 convolution, and then sampled on a bilinear interpolation, the final images of different channels are sampled to the same resolution images, the final fusion superimposed, the result to the next layer.

2.3 residues mistake series module (CRP)

MRF is the next level of CRP module, we can say that the role of the first two modules are fused by the image information at different resolutions, and the role of CRP module is mainly to capture background information, will enrich the image background information, as follows FIG:
Here Insert Picture Description
an input image and then passes through ReLU first pool results obtained by the convolution module after the image is fused directly ReLU; cascade cell layers convolution module behind the same operation, the authors suggest that the pool convolution restart pool convolution module is not carried out after each time the sum cascading layers, which will help it to learn more in-depth information, more conducive to correct and supplement the overall results.

2.4 RefineNet modular structure

The reason that the title RefineNet module structure, because of drunk again in RefineNet overall structure also contains the following diagram this module. Here Insert Picture Description
On this module and FIG convolution integral is a residual module (RCU), multi-resolution integration module (the MRF), a series of residues mistake module (CRP) three modules put together, and finally through a balance of all of the modules RCU weight, to obtain the same final segmentation result input spatial dimensions. So in fact RefineNet can be considered a module that is similar to a dense block of the input image feature extraction (similar to a whole band filling convolution module, but good effect this convolution module can be extracted to very complete feature information).Here Insert Picture Description

2.5 RefineNet overall network structure

在介绍RefineNet总体网络结构以前,作者先对ResNet和空洞卷积网络结构进行了分析,其如下图:
Here Insert Picture Description
(a)中为ResNet结构,由于连续不断的池化,图像从输入1/4到最终输出已经变成1/32,图像的很多重要信息已经丢失了。而(b)中虽然使用空洞卷积代替了池化操作,但是由于空洞卷积没有缩小图像尺寸,所以参数量大,需要花费大量的计算资源。于是,作者提出(c),如下图:
Here Insert Picture Description
通过对图像的不同大小经过RefineNet模块(就类似于卷积的作用提取高分辨率特征)再层层融合,这样既能保证图像信息较为完整而且参数量较小,没有花费大量计算资源。可以看到,这个网络的整体结构非常类似U-Net,增添了RefineNet模块进行高分辨率的特征提取,U-Net结构如下图:
Here Insert Picture Description

2.6 RefineNet变种结构

作者提出,RefineNet可以通过使用RefineNet模块进行不同的变种来适应不同的场合,如:
① 变种1:
Here Insert Picture Description
为仅仅使用一个RefineNet模块的结构。
② 变种2:
Here Insert Picture Description
为使用两个RefineNet模块的结构。
③ 变种3:
Here Insert Picture Description
这个结构引入了两个尺度的图像,一个为1.2倍的输入图像,另一个为0.6倍的图像。0.6倍的图像处理最后作为辅助的融合信息传给1.2倍的子操作中。这样构成了两个尺度的RefineNet网络结构,微微比一个尺度的输入图像的结构准确率高。

3 部分效果

Here Insert Picture Description
IoU result Pictured on each network Cityscapes test set.
Here Insert Picture Description
The picture shows the comparison of different variants NYUDv2 network dataset, the more visible RefineNet module, the more the scale, the higher the accuracy.
Here Insert Picture Description
The network with the goal of analytic aspects of performance is very good, the picture shows RefineNet on Person-Parts dataset effect, looks very good.

4 Conclusion

RefineNet proposed, by using a multi-resolution image fusion can efficiently extract feature information of the image-rich, and is different from a method for solving the problem of multi-scale SPP, ASPP like.

5 References

(. 1) RefineNet: the Multi-Networks, the Path for High-Resolution Refinement the Semantic Segmentation
(2) [Model] multiresolution image segmentation feature fusion -RefineNet

Published 24 original articles · won praise 27 · views 10000 +

Guess you like

Origin blog.csdn.net/gyyu32g/article/details/104401743