their MVF-Net: Multi-View 3D Face Morphable Model Regression(2019 CVPR)

This link: https://blog.csdn.net/qq_35045096/article/details/94409625
                   
               
                   
               
                                                   
            
            
           
                                       
                   
                   
                                            problem solved 1.
centralized recovery 3D face geometry from a set of multi-view images of the human face. In this paper, a set of multi-view face image as input, based on a shape recovery 3DMM studied law.
2. Methods of
using end trainable convolutional neural network (CNN) 3DMM regression parameters from multiview input. Multi-view geometrical constraints by using a new self-aligned supervision view loss, dense establishing correspondence between the different views, be incorporated into the network. View of the main components of the loss is aligned with a micro-dense optical flow estimation, it can backpropagation alignment errors between the input and the asynchronous rendering views. By minimizing the loss of alignment of view, a better three-dimensional shape can be restored, thereby enabling a better alignment with the image observed from one view to another view of the composite projection.
3. The model uses
3.1 Overview of
using an end to end training CNN can be regressed 3DMM parameters from different angles of the same individuals multiple faces in the picture. In order to build like a traditional multi-view three-dimensional reconstruction methods such as multi trying to geometric constraints, we now assume that the face image is captured at the same time under the same lighting conditions. However, the method described herein is capable of processing changes in illumination. For simplicity, we take a three-dimensional view settings to describe our approach. It is noted that the present model is also suitable for other numbers of input views.
We learn from each feature a re-enter the picture by a shared right of CNN, and then connect these features together to return a set of 3DMM parameters. The difference is that we according to the features of its independent return the gesture input parameters for each view. According attitude parameters and 3DMM parameters, we can render a textured 3D face model from each of the input image. Background In the three views, to give three texture 3D face model, they have the same three basic 3D shape, but different textures. After obtaining the 3D model rendering of a human face in different views, we will view them from sampling the projection texture to a different view. For example, we will view the 3D model A having sampled texture projected view B. Then, we can calculate the loss between the input image and the projected image on the target view. Notably, the rendering layer nonparametric but differentiable, and therefore, the backpropagation gradient may trainable layer.
3.2 Model
3.2.1 Face model
face model using 3DMM, can be expressed as
3.2.2 differential model
to the 3D model projected onto the 2D image plane, we apply the weak perspective projection model.
3.3 Network regressor
Input: three views Ia, Ib, Ic
output: Parameter 1 3DMM attitude parameters +3
3.4 texture sample
having 3DMM predicted parameters, and the known characteristics and expressions yl group, calculated by using the face model a 3D face model corresponding. By sampling each input image from a texture, the texture map obtained using three different attitude parameters of the regression. For each vertex in the 3D model, we can use the above-described differential model, projected onto the image plane to the vertex and differential sampling mechanism used to extract the texture color from the apex of each input image. Three-dimensional grid point within the triangle, we obtain the center of gravity interpolating the texture color from the surrounding vertices. Texture sampling mechanism can handle occlusion, texture sample from each of the input image is incorrect occlusion region. Here the use of visualization masks to deal with this problem.
3.5 rendered projection and visualization mask
3.5.1 projection rendering
texture is projected onto the 3D model view by any layer can be rendered into a differential render a 2D image. For example, given a 3D model with a texture from a picture of a Ia sampling, we can use the posture parameter Pb to render it to the Ib of view, the use of Iab to represent. In general, for any 3D grid plane a point v, the color rendering of the projection image of the pixel can be calculated:
For any pixel of the target plane, the above equation may be expressed as:
Ideally, it has the best 3D-model and attitude parameters to observe pictures and Ib should render the same picture in the non-occluded regions.
3.5.2 Visualization masks
we need different masks to enhance the luminosity and consistency of the observed image rendered image. After rendering the image, we just visible by extracting a mask (mask) exclude regions other views may be occluded using the human face and 2D 3D landmarks corresponding to the vertex. For real image observed, we texture sampling region to acquire an initial mask, and performing joint edge-preserving filter on the initial mask. In the real-oriented input image, the input image of the edge face of the mask region is well aligned. Finally, the 2D detector landmarks exclude other views may be occluded area, similar to the mask processing of the rendered image. For the front view image, to obtain two different visualization mask when viewed from the left and right.
Rendered picture visualization masks:
Observation of the mask processing procedure of the image:
3.6 Loss and training
to use the data in the 300W-LP set supervision label pre-training CNN, focused on this data, the real scene 3DMM and attitude parameters algorithm, multi-view picture analysis technology has been enhanced by facial fit by 3DMM convolution . Then re-execute the Multi-PIE data sets from supervision and training, focused on the data, multi-view images are acquired during a controlled indoor scenes.
3.6.1 supervised training

Two intermediate L2 loss loss real scene and the predicted value.
3.6.2 self-supervised training
 

4. Experimental
Enter Photo Gallery
Rebuild:
----------------
Disclaimer: This article is CSDN blogger "qq_35045096 'original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
Original link: https: //blog.csdn.net/qq_35045096/article/details/94409625

Guess you like

Origin www.cnblogs.com/skydaddy/p/11935002.html