Super-resolution paper reading notes: CSNL

Cross-Scale Non-Local Attention paper notes

Image Super-Resolution with Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining

The attention mechanism of cross-scale non-local attention and exhaustive self-sample mining
addresses the problem : Most of the existing research ignores the long-distance feature correlation in natural images, although this correlation can be used through the non-local attention module However, none of the current depth models involve another inherent image attribute, cross-scale feature correlation.
Motivation : Cross-scale block similarity is common in natural images. The self-similar non-local search can be extended from pixel to pixel to block, which means that in addition to processing non-local pixel matching, pixels can also match larger The connection between the image block matching of the natural image and the cross-scale features of the natural image allows us to directly search for high-frequency details from LR, so as to obtain more accurate and higher-quality reconstruction results.
Contribution of this paper : Proposed the first deep learning model for SISR task-cross-scale non-local attention module. Explicitly express the similarity between pixels to block and block to block within the image, and prove that mining cross-scale self-similarity greatly improves the performance of SISR. The
proposed method (model): CS-NL
CS-NL model

Principle of IN-Scale NL Attention:

Non local formula
z represents the value of i, j position after passing through the NL module, Xi, j represents the value of the input feature map i, j position, Xg, h represents the value of the input feature map g, h position, Xu, v represents the value of any position, This formula represents the relationship between each pixel of the entire feature image and Xi,j, and then divide the relationship between all positions and Xi,j with the obtained relationship, so that the position of Xg,h affects Xi,j The size of the weight, and then add the mapping of Xg, h position to Xi, j position, and finally the weighted sum of all positions to Xi, j mapping to obtain a new representation of i, j position.
Principle of Cross-scale NL Attention:
CS-NL formula expression
Y represents the down-sampled image of s times, Yg,h represents the feature at the location of g, h, Yu,v represents the feature at any location, X sg, shs ∗ s X_{sg,sh}^ {s\ast s}Xs g , s hss, Represents the blockInsert picture description here
of s s size relative to the position of Yg, h . The whole process is shown in the following figure: After the original feature map in the figure is reduced by S times, each block in the small-scale image represents the size of s s in the original feature map Block, finding the relationship between the small image and the pixels of the original image is equivalent to finding the relationship between the relative position of the s*s block in the original image and the pixels in the original image. Therefore, the weighted summation of pixels evolved into the weighted calculation of image blocks. And, replace the pixels in the original image with the weighted blocks, and get the final feature map expanded by s times.
The article finally adopts the relationship between blocks to achieve, the principle is the same as the above, but the relationship between pixels and blocks is replaced with the relationship between blocks.
Self-Sample Mining The
Self-Sample Mining (SEM) unit exhausts image information by repeatedly mining and fusing information. Inside the module, by combining local, intra-scale non-local, and suggested cross-scale non-local feature correlations, all possible internal priors are thoroughly explored.
Cross-projection fusion In
this article, from the sample mining module, three methods of extracting features need to be used, namely CS-NL, NL, and ordinary convolution. It uses the method of DBPN to fuse it in Together, as shown in the following figure:
Cross-projection method
Fc represents the output of the CS-NL module, Fi represents the output of IN-NL, and Fl represents the output of only ordinary convolution.
**Summary:** The network uses SEM to recursively dig out the feature information of the image, output it to the final stitching stage, and finally generate a high-quality SR image through convolution. The CS-NL model uses the correlation of features between different scales. Through CS-NL, the details of the image become richer. The information represented by a single pixel becomes multiple points for information representation, which further eases the convolution process. The fusion of information.

Guess you like

Origin blog.csdn.net/weixin_44712669/article/details/109378495