【Super Resolution】Deep Back-Projection Networks For Super-Resolution

Sentence name: "Deep Back-Projection Networks for Super-Resolution"
Author: Muhammad Haris, Greg Shakhnarovich and Norimichi Ukita

This article is an article from CVPR in 2018. Compared with the "Residual Dense Network for Image Super-Resolution" that was accepted at the same time, the creativity of this article is very exciting, rather than just building blocks. Below I will combine my own understanding and the general idea of ​​the article to explain my thoughts on this article.

Summary

The author believes that the deep super resolution networks in the past usually learn a feature first, and map the LR space to the HR space through the powerful non-linear mapping of the network. However, this paper believes that this operation may not be so effective for mining the mutual dependencies between LR and HR images. Therefore, the author constructs a network that continuously up-sampling and down-sampling stages for error feedback, which is similar to the iterative back-projection algorithm, thus achieving the effect of state-of-the-art.

Related work

So far, super-resolution networks have roughly the following three situations:

一、Predefined upsampling


write picture description here

This algorithm requires pre-interpolation to make the input and output sizes the same.
This also originated from SRCNN (Dong et.al). Dong believes that setting the size of the input and output feature maps to be consistent will facilitate nonlinear mapping. Otherwise the stride needs to be set to a fractional stride, which will cause a lot of inconvenience.
However, this operation often increases the amount of computation.

二、Single upsampling


write picture description here

Observing the increase in the amount of computation brought about by the above structure, single upsampling directly processes the LR Image instead of the pre-upsamplde Bicubic input image. This will greatly reduce the amount of computation. (The following two typical algorithms)
The first: deconvlution layer: Represents FSRCNN, Dong et, al
The second: sub-pixel convolution layer: Represents: ESPCN, twitter

三、Progressive upsampling


write picture description here

This structure, combined with the above network structure, can perform upsampling at each stage of progress. This network structure should bring good benefits when the upscaling factor is x4 and x8.
Representative work: LapSRN

Structure of this paper

This paper proposes a fourth architecture:


write picture description here

It is to correct the reconstruction error through continuous up and down sampling and error feedback.

1)、Projection units

up-projection unit:


write picture description here

The algorithm is described as follows:


write picture description here

Compare Iterative Back-projection
Assume X is a high-resolution image, Y is a low-resolution image, D is a downsampling operator, and H is a blur operator


X = argmin ||DHX-Y||^2
X(t+1) = X(t) - FTDT(DFX-Y)

In traditional algorithms, iterative back-projection algorithms often need to estimate the H fuzzy operator.
The above two algorithms are basically the same.

up-projection unit:


write picture description here

The algorithm is described as follows:


write picture description here

2)、Dense Projection units

D-DBPN performs feature sharing by combining DenseNetwork:


write picture description here

The experimental setup of this paper

In the network structure of this paper, the author declares that the filter size in the projection unit is various with respect to the scaling factor.
For 2x enlargement, the author used a 6x6 convolution layer with 2 striding and 2padding.
For 4x enlargement, the author used an 8x8 convolution layer with 4striding and 2 padding.
For 8x enlargement, the author used a 12x12 convolution layer with 8 striding and 2 padding.

So why is this set up?
We know that the size of the feature map after the convolution layer and the conv settings are related to
Assume Input image is H x W , filter size: F, padding : F, stride S The
output is:
H ' = (H - F + 2P)/S +1
For 2x enlargement Substitute the above parameters to find:
H ' = (H - 6 + 2*2)/2+1 = H/2

About the training set :

DIV2K, Flickr, and ImageNet dataset
we can see that its training set is quite large, much larger than the previous 291image

The experimental results of this paper

1), numerical indicators:
write picture description here

2), visual effects:
write picture description here

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325724254&siteId=291194637