Paper notes SR - SFT

Recovering Realistic Texture in Image Super-resolution by

Deep Spatial Feature Transform

Abstract

    On a single image super-resolution (SR) reconstruction of high quality, although CNN will have a good result, but how to restore a natural and realistic textures remains a challenging problem. In this paper the network based on a single semantic segmentation probability map, the intermediate layer is characterized by a spatial transformation feature extraction (the SFT), end to end to achieve the super-resolution reconstruction.

Introduction

    Single image super-resolution aimed at restoring a high-resolution (HR) image from a single low-resolution (LR) images. Conventional SR method is generally based on the MSE, which is based on pixel space MSE loss of latitude, this method will cause image blur generated and too smooth. SRGAN proposed perceptual loss of features to optimize the space latitude latitude instead of pixels, further proposed adversarial loss to generate more natural picture. Use perceptual loss and adversarial loss greatly improved the perceived quality of the reconstructed image. However, the resulting texture tends to be monotonous and unnatural.

    The author explores an important reason is due to unnatural textures, which corresponds to the LR segment may not have quite the same for HR segment, which results in the sampling time, the model is difficult to distinguish which category the current slice image belongs, so resulting in false texture image synthesis.

    As used herein, semantic segmentation FIG priori information to guide the classification of different texture regions SR recovery, using spatial feature transform (the SFT) to convert some of the features of the intermediate layer of the network, the network to change the properties of SR. SFT layer semantic segmentation probability graph conditions, generate a pair of the modulation parameter used in the application of network space wherein affine transformation in FIG.

SFT wherein the intermediate layer is formed by converting a single network with a single forward transfer can be achieved reconstructed HR image with semantically rich region. Meanwhile, the SFT layer can be easily introduced into the existing network structure SR, training can be performed with the SR-end network.

Methodology

3.0 Definitions hypothesis

Given low resolution x, super-resolution estimation results , the real super-resolution result y, it is assumed to be based on CNN's mapping , the mapping relation:

    

Based on the parameters set loss:

    

Divided semantic categories as a priori probability map:

In this case, the mapping relationship is based on a priori:

    

3.1 Spatial Feature Transform SFT层

① parameters

Wherein the spatial transform (the SFT) Layer Learning "based on a priori conditions Ψ, output modulation parameters (γ, β)" mapping function M.

That is , M theory should be any equation, but it was still refers to the use convolution neural network to do the mapping relationship, and SFT get the parameters of the process is a condition shared

Adaptively influencing parameters of (γ, β) by spatially SR wherein each intermediate network application affine transformation output from the FIG., I.e.,

Layer structure ② SFT

SFT layer is for generating a small network condition information of different layers, conversion method:

SFT has two inputs, one input is the output condition network Conditions, and the other is the output layer F. conditions were obtained by calculating the convolution of two beta] and γ, then the parameters for the upper layer F transform, to calculate the whole SFT output layer and the output layer SFT entire next layer and an input F.

③ SFT implementation

After two convolution F directly to do multiplication, addition.

3.2 Network Structure

① network structure

The body is divided into three parts, with the proviso network condition network, SR network portion and upsampling

a) dividing network conditions the probability map as an input, using four convolution processing, it generates all the intermediate conditions SFT level sharing. Here in order to avoid a mutual interference of different image regions classified by using a 1 × 1 nuclei of all layers to limit the convolution of the receptive field network conditions.

b) SR 16 network consists of the residual block, each residual block comprises two SFT + Conv, all sharing Conditions SFT, SFT and to share as input learning Conditions (γ, β) by applying the affine transformation to modulate wherein FIG.

c sampling section sampling using a convolutional + layer nearest neighbor upsampling operation on)

最近邻插值:在待求像素的四邻象素中,将距离待求像素最近的邻像素灰度赋给待求像素,最邻近元法计算量较小,但可能会造成插值生成的图像灰度上的不连续,在灰度变化的地方可能出现明显的锯齿状。

3.3 Loss Function

感知损失perceptual loss

其中使用预训练的VGG-19特征

对抗损失 adversarial loss

传统的GAN loss,使D对G的判定尽可能的准确

Experiments

①限定七个类别,不属于7类别的则为背景。在训练时,将图像裁剪到只包含一个类别来训练D的识别能力。

②训练时使用包含单个类别的图像进行训练,测试时使用包含多个类别的图像进行测试。

PS:训练时单个类别损失,测试时多个类别,网络是否有效?

——作者在test阶段,输入LR图片和k张分割概率图,每一张概率图中都显示着某一特定物体在该位置存在的概率,输入数据兵分两路进入G,得到输出的SR图像。已经训练好的G+分割概率图,G网络便获得了根据概率图针对性重建图像的能力(根据概率图,将多类别问题转化为单类别问题)

——训练时不是整张图片的训练,而是将原图裁剪到98*98大小进行训练,保证只包含一个分类。但由于使用的是概率图,虽然每次是单个类别,但是整体数据包含了所有的类别,也就保证了结果的鲁棒性。

③D部分,除了包含常规GAN的real/fake分支损失,还包含了一项类别损失,使D能够对不同的类别进行识别,具体是使用多类交叉熵CrossEntropyLoss来对类别进行限制。

summary

① 使用perceptual loss和adversarial loss极大地改善了重建图像的感知质量。然而,早缺少先验信息的情况下,生成的纹理趋于单调且不自然。

② 使用语义分割概率图作为条件先验。传统SR算法着重与PSNR,基于MSE的损失使得图像模糊;SRGAN引入perceptual loss,通过VGG网络提取深层特征进行训练,但纹理信息缺少,因为笼统的提取特征,在遇到不同事物具有相似纹理时往往导致误判,也就是说D网络不尽责,区分墙壁与野草的能力差,使得效率低的G网络轻松通过鉴别,;SFT使用seg map来进行引导,一方面明确告诉G物体的类别,(seg提供信息,SFT处理信息)另一方面告诉D如何识别物体的类别

③ SFT对每个分类寻找通用的增强信息。由于seg map是概率图,所以除了包含spatial信息之外,还会包含一定的纹理细节信息 这些信息会对sft有一定的影响。sft实际上就是针对每个不同的分类对LR feature加入特定的纹理细节,即(β,γ)的作用。整体的过程就是在保留LR的空间信息的情况下,加入基于分类的纹理细节。

Guess you like

Origin www.cnblogs.com/mjhr/p/11233583.html