Paper notes SR - GWAInet & GFRnet

Learning Warped Guidance for Blind Face Restoration

&

Exemplar Guided Face Image Super-Resolution without Facial Landmarks

Introduction

Facial image super-resolution Face image super-resolution reconstruction aimed detail resolution (LR) facial image. Given a person's facial image LR, if only from a single image SR image LR, SR image obtained is likely to be too smooth and lacking in detail, you can consider to find another person of the same person taken in different conditions at different times face HR image, use this picture to be used to direct the production of HR SR process details.

Because images generally have two different motions, expressions, and lighting conditions, so by simply FIGS degradation and the guide map as input to CNN, it is not a good solution to the guide face restored.

GFRNet proposed WarpNet and RecNet, WarpNet used to correct alignment of the guide figure, the image is converted into and guided FIG low clearance as the pose and expression, RecNet feature to guide and reconstruction. However, by directing FIG directly trained learning WarpNet not result in a better recovery of the image is not aligned, so the introduction of the method for aligning the face detected alignment face alignment method, using total variation regularization landmark loss and to guide the learning process WarpNet

GWAInet to use the alignment guide of FIG subnet Wnet aligned with the input image content, and using the feature image extracting Gnet chain fusion from the guide and enter in FIGS. Use identity loss in the training process to ensure the identity feature.

 

Proposed Method

2.1 GFRNet

The image size is 256 * 256, assuming that guided image and the target image is the same person, and guided image quality, forward, without shelter, eyes open. Consistent low-quality images and guided image size. If not, put low-quality images processing and then sent to a consistent size GFRNet.

2.1.1 Guided Face Restoration Network

GFRNet WarpNet divided into two parts and RecNet, WarpNet the degradation map and the guide of FIG as network input, to predict the flow field warping guided image to obtain a corrected guide of FIG.

By Field, Flow , get warped guidance as follows:

Obtained with ground-truth has the same pose and expression and so on.

And then degradation FIGS correction map as an input, referred reconstruction RecNet

2.1.2 Warping Subnetwork (WarpNet)

① input network is two stacked (channel channels) Image <FIG degradation, FIG guide>, auto-encoder uses the structure of the network, and the training is based on the following key points to keep degradation of the face wrap FIG al the key point is aligned face, which is the figure landmark loss.

② structure without using auto-encoder cross-layer. Considering the two input RGB image, the output flow field, isomerization between the input and output lead structure suitable for use Unet decoding. Encoder: extracting features from the Id and Ig is represented by eight convolution layers, the size of each layer is the convolution of 4 × 4 and step 2. Decoder: using a convolution layer 8 layer to obtain Ig flow field in a desired pose and expression.

③ In addition to the first layer in the encoder and the decoder in the last layer, all other layers are used Conv-BatchNorm-ReLU form.

 2.1.3 Reconstruction Subnetwork (RecNet)

For RECENT ①, the input (Id and Iq) and the output (I) has the same posture and facial expression, so the use of U-Net recovery results to produce the final I.

② RecNet 采用的是 U-Net 结构,它的输入与 WrapNet 类似,也是将 wrap 后的图片和退化图叠在一起。通过跨层连接来保证不会丢失细节

2.1.4 Losses

Losses on Restoration Result Iˆ

    ① Reconstruction loss

恢复图像与真实图像的MSE loss + perceptual loss

perceptual loss(VGG loss)

其中用ψ表示VGG-Face网络,ψl(I)表示第l个卷积层的特征图,Cl,Hl和Wl分别表示特征图的通道数,高度和宽度

reconstruction loss = MSE loss + perceptual loss

    ② Adversarial Loss 对抗损失

仅仅依靠MSE loss + VGG loss只能做到普通的图像修复或超分的效果,对于一些更细节的纹理,无法做到恢复(因为均方误差是在拟合一个整体平均值,而纹理这种高频的信息往往会被当作异常点,因此拟合效果不佳)。考虑到人脸中的细节得靠引导图来提供,如果引导图中有纹理,那么就应该恢复出这些纹理,反之亦然。这类似于条件概率,因此采用cGAN(Conditional GAN) 。同时为了提高训练稳定性,用平滑的0 / 0.9替换标签0/1,以减少对抗性的例子。

总体的对抗损失为:

PS:不是很明白这里关于GAN的说明。他应该是RecNet作为一个G,使用GAN来生成纹理,然后他又提出Globa GAN 和 local GAN,博客上说Globa GAN是对整张图片和ground-truth进行对抗,local GAN 是对人脸区域,那样的话,仅仅使用一个GAN不可以吗?而且他这里GAN是输入一整张图片吗?因为想恢复纹理信息,那我们是不是应该输入patch会更好一点?

——这里面实际上是用的是part-Discriminator的方式,global gan是对整张图像,local gan实际分为 左眼 右眼 鼻子 嘴区域的 分别对这些区域进行gan loss。global对整张图片进行增强,local gan对每个区域使用不同的D进行增强,从而得到更好的纹理细节。

Losses on Flow Field Φ

使用MSE loss不能够实现图片的对齐

① Landmark loss

在引导图和真实图片上分别筛选了68关键点,然后尽可能使flow field Iw与ground-truth I的关键点位置接近,从而定义Landmark loss:

其中,所有的坐标都进行了标准化-1 ~ 1

② TV regularization

关键点数量是有限的,这就导致训练的结果很可能虽然可以保证关键点位置是对齐的,但是其他位置偏移很多,因此加入TV loss来使得结果更加平滑,关键点位置不是完美对齐但其他位置也不会偏移很多。

其中为反向传播梯度结果

这部分总体的flow loss = λ1 * landmark loss + λ2 * TV loss:

 

2.2 GWAInet

2.2.1 Network Architecture

GWAInet主要包含四部分,Gnet,Wnet,Cnet和Inet。

Warper (Wnet)

    Wnet的目标是产生flow field,使得修正引导图其与输入LR图像的内容对齐,消除两个图像中面部的姿势或大小的差异。

    在传入Wnet之前,退化图ILR 通过双三次插值进行上采样,然后与引导图IGI 通过concat进行合并送入Wnet

PS:At each pixel location, the first value determines the sampling motion horizontally and the second value determines the sampling motion vertically. It should be noted that flow field values are not scaled into a specific range, meaning that no constraints are applied at the output.不太明白它的Wnet到底是怎么搞得?他似乎也没有告诉我他的Wnet是什么结构?

——第一层表示水平坐标,第二层是垂直坐标,wnet采用和GFRnet一致的warpnet来得到校正的flow field

    得到的flow field与IGI 通过双线性插值得到修正图IGW

Generator (Gnet)

Gnet由两部分构成,使用ILR 作为输入的SRnet和使用IGW 作为输入的GFEnet

SRnet包含16个残差块,每4个残差块之后得到的特征会与GNet得到特征进行融合,最终16个残差块结束后得到的结果通过上采样得到与ground-truth相同的大小结构。

GFEnet包含3个下采样块和12个残差块,每个下采样使得图片尺寸缩小2倍,没四个残差块之后,得到的特征与SRnet残差块部分得到的特征进行融合,通过一层卷积后作为SRnet下一个残差块的输入。

Critic (Cnet)

Cnet结构使用GAN结构(除去BN层),Gnet作为G网,在Cnet中与ground-truth进行评判。

Identity Encoder (Inet)

Inet结构使用VGG-16结构,给定Gnet的输出面部图像ISR及其对应的HRground-truth图像IGT,Inet用于评估它们的相似性。使用这个相似性信息来惩罚ISR 中与ground-truth中不同的特征,即Inet来评判ISR与ground-truth是否是同一个人,从而问题变为二分类问题,使用交叉熵损失来进行判定:

2.2.2 Loss Functions

① Content loss

使用L1 loss

保证ISR与ground-truth之间的内容相似,由于是想比较像素之间的相似,这样的话取L1 loss对差异点的敏感性会弱一点,比较稳定。

② Adversarial loss

使用WGAN-GP对G的对抗损失

WGAN-GP对G的adversarial loss = = - ,即负的D对ISR的判定结果

③ Identity loss

我现在的理解是Inet是VGG-16结构,然后Inet对ISR,IGT提取的特征进行L2 loss

④ Critic loss

使用WGAN-GP对D的loss

其中,即D对ground-truth的输出

PS:这里的Lgp是指什么?

——Wgan-gp里面的gradient penalty项 ,WGAN-GP为了更好的解决WGAN中的梯度消失与梯度爆炸问题,引入梯度惩罚(gradient penalty)

总体loss

 

summary

 ① 将高清引导图和低清退化图一起作为输入进行处理

② GFRNet分为两个子网,WarpNet对齐引导图,RecNet重建图片。

③ GFRNet使用面部标志点来引导对齐,使用TV loss来平滑结果,RecNet加入condition GAN来生成更多纹理信息。

④ GWAInet对GFRNet的RecNet部分进行了改进,没有使用U-Net结构,使用残差SR结构。

⑤ GWAInet提出Identity loss,来保证SR与GT之间的身份一致。

⑥ GWAInet相对与GFRnet,GWAInet是更改了RecNet部分,然后详细说明了GFRnet的GAN部分,这样说的话,他去掉了GFRnet的local GAN,只保留了global GAN。然后这个Identity loss感觉似乎是额外多加上去,跟其他几个部分没有什么联系,所以感觉比较奇怪。

⑦ GWAInet引入Identity loss。Inet作为单独训练的一个2分类结构,在对其他部分比如wnet,gnet进行更新的时候 inet是固定参数的,只提供loss。而有一种对人脸复原效果的测试是看复原前后人脸识别的准确度。Identity loss应该是利用这点来对sr任务进行增强。

Guess you like

Origin www.cnblogs.com/mjhr/p/11233664.html