Article Directory
论文: 《Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold》
github: https://github.com/XingangPan/DragGAN
Summary
Users want flexible control over pose, shape, expression, and generating target layouts. Existing methods: GAN is implemented through a calibrated training set, or a priori 3D model, which lacks flexibility, accuracy, and generalization. This paper proposes DragGAN, which mainly includes two parts:
1. Feature-based motion supervision, which drives the point to move to the target position;
2. The point tracking method uses the generator feature to locate the point.
Even for some challenging scenes, such as occlusion
question
DragGAN mainly solves two problems:
- Move the point to the target position;
- the location of the tracking point;
DragGAN is based on the idea that
the feature space of GAN is sufficiently discriminative for motion supervision as well as precise point tracking.
3. Algorithm:
3.1 Point-based interactive operation
The image control process is shown in Figure 2. For the hidden vector www and GAN to generate picturesIII , the user can input a series of processing pointspi p_ipiMemory corresponds to the target point ti t_iti, the goal is to move the target in the graph so that the semantic position of the processing point reaches the corresponding target point.
As shown in Figure 2, the optimization process is divided into two steps: motion supervision and point tracking. A loss function that forces the processing point to move to the target point is used to optimize the hidden vector www , get a new hidden vectorw ' w'w ' and new pictureI' I'I ' , each optimization only moves a small step, and the specific step size is not clear, so it is necessary to update the position of the processing point through the tracking module. This process lasts for 30-200 iterations.
3.2 Motion Supervision
The author proposes that the motion supervision loss does not depend on an additional neural network. Since the intermediate features of the generator are already different, the author selects the feature of the sixth block of StyleGAN2 , and the author resizes it to the same size as the output image. As shown in Figure 3, the mobile processing point ppp tottt , the loss function is as in formula 1,
The binary mask M is used to ensure that the feimask area remains unchanged; the hidden vector www available atWWW Space Optimization is also available atW+W+W + space optimization,W + W +W + space is easier to manipulate on outlier data,W + W +W + indicates that each layer of StyleGAN2 uses different hidden vectorwww, W W W means that each layer uses the same hidden vectorwww . Experiments found that image space properties are affected by wwThe first six layers of w are affected, so onlywwThe first six layers of w .
3.3 Point Tracking
update ww by motion supervision modulew forw 'w'w ' , get a new feature mapF ' F'F ' , new pictureI' I'I ' , but unable to provide processing points in the new graphI'I'I ' Position, point tracking is used to update processing pointppp . Conventional point tracking schemes are optical flow or particle video methods, but are not efficient or produce cumulative errors, especially when GANs generate artifacts.
The author believes that the characteristics of GAN capture the consistency information of dense points, so the processing point can be found through the nearest neighbor search, such as formula 2,
4. Experiment
4.1 Quality assessment
Figure 4. The author compares DragGAN and UserControllableLT, and the result of DragGAN is more natural and the movement is more accurate;
In Figure 6, the author compares the point tracking method with PIPs and RAFT, and the method proposed by the author is more accurate.
**Real picture manipulation. **Reverse GAN encodes real pictures to the latent space of StyleGAN, and real images can also be manipulated, as shown in Figures 5 and 13
4.2 Quantitative evaluation
Face manipulation.
The author uses StyleGAN to generate two faces, uses existing tools to predict the key points of the face, uses DragGAN to migrate the key points of the face in Figure 1 to the position of the key points of the face in Figure 2, and calculates the key points of the face in the image after migration. The face key point distance is used as the evaluation index. The results are shown in Table 1, and the visualization results are shown in Figure 7.
Pairwise image reconstruction.
The author uses StyleGAN to generate pictures I 1 I_1I1and I 2 I_2I2, randomly sample 32 points in the optical flow area as user input UUU , the goal is to useI 1 I_1I1and UUU heavy structureI 2 I_2I2, and the quantitative results are shown in Table 2.
Ablation experiment
The author compares the influence of different layer features on motion supervision and point tracking. As shown in Figure 3, the sixth block feature of StyleGAN performs best.
4.3 Discussion
Figure 8 shows the effect of the movable region mask.
Figure 9 shows image manipulation of OOD data.
Limitations:
Figure 14a shows some limitations, which are prone to artifacts for some poses that deviate from the distribution of the training set.
As shown in Figure 14b and c, for some processing points lacking structural information information, there will be offsets during tracking.
in conclusion
The authors propose DragGAN, an interactive point-based image editing method that can manipulate images based on user input. This is due to two points:
a. the implicit vector optimization module, which moves the processing point to the target point;
b. the point tracking module accurately tracks the processing point trajectory.
DragGAN surpasses existing GAN-based image manipulation methods, while pioneering new directions, exploiting generative priors for image manipulation.