DragGAN paper reading


论文: 《Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold》
github: https://github.com/XingangPan/DragGAN

Summary

Users want flexible control over pose, shape, expression, and generating target layouts. Existing methods: GAN is implemented through a calibrated training set, or a priori 3D model, which lacks flexibility, accuracy, and generalization. This paper proposes DragGAN, which mainly includes two parts:
1. Feature-based motion supervision, which drives the point to move to the target position;
2. The point tracking method uses the generator feature to locate the point.
Even for some challenging scenes, such as occlusion

question

DragGAN mainly solves two problems:

  1. Move the point to the target position;
  2. the location of the tracking point;

DragGAN is based on the idea that
the feature space of GAN is sufficiently discriminative for motion supervision as well as precise point tracking.

3. Algorithm:

3.1 Point-based interactive operation

The image control process is shown in Figure 2. For the hidden vector www and GAN to generate picturesIII , the user can input a series of processing pointspi p_ipiMemory corresponds to the target point ti t_iti, the goal is to move the target in the graph so that the semantic position of the processing point reaches the corresponding target point.
As shown in Figure 2, the optimization process is divided into two steps: motion supervision and point tracking. A loss function that forces the processing point to move to the target point is used to optimize the hidden vector www , get a new hidden vectorw ' w'w ' and new pictureI' I'I ' , each optimization only moves a small step, and the specific step size is not clear, so it is necessary to update the position of the processing point through the tracking module. This process lasts for 30-200 iterations.
insert image description here

3.2 Motion Supervision

insert image description here

The author proposes that the motion supervision loss does not depend on an additional neural network. Since the intermediate features of the generator are already different, the author selects the feature of the sixth block of StyleGAN2 , and the author resizes it to the same size as the output image. As shown in Figure 3, the mobile processing point ppp tottt , the loss function is as in formula 1,
insert image description here

insert image description here

The binary mask M is used to ensure that the feimask area remains unchanged; the hidden vector www available atWWW Space Optimization is also available atW+W+W + space optimization,W + W +W + space is easier to manipulate on outlier data,W + W +W + indicates that each layer of StyleGAN2 uses different hidden vectorwww W W W means that each layer uses the same hidden vectorwww . Experiments found that image space properties are affected by wwThe first six layers of w are affected, so onlywwThe first six layers of w .

3.3 Point Tracking

update ww by motion supervision modulew forw 'w'w ' , get a new feature mapF ' F'F ' , new pictureI' I'I ' , but unable to provide processing points in the new graphI'I'I ' Position, point tracking is used to update processing pointppp . Conventional point tracking schemes are optical flow or particle video methods, but are not efficient or produce cumulative errors, especially when GANs generate artifacts.
The author believes that the characteristics of GAN capture the consistency information of dense points, so the processing point can be found through the nearest neighbor search, such as formula 2,
insert image description here
insert image description here

4. Experiment

4.1 Quality assessment

Figure 4. The author compares DragGAN and UserControllableLT, and the result of DragGAN is more natural and the movement is more accurate;
insert image description here

In Figure 6, the author compares the point tracking method with PIPs and RAFT, and the method proposed by the author is more accurate.
insert image description here

**Real picture manipulation. **Reverse GAN encodes real pictures to the latent space of StyleGAN, and real images can also be manipulated, as shown in Figures 5 and 13
insert image description here
insert image description here

4.2 Quantitative evaluation

Face manipulation.
The author uses StyleGAN to generate two faces, uses existing tools to predict the key points of the face, uses DragGAN to migrate the key points of the face in Figure 1 to the position of the key points of the face in Figure 2, and calculates the key points of the face in the image after migration. The face key point distance is used as the evaluation index. The results are shown in Table 1, and the visualization results are shown in Figure 7.
insert image description here
insert image description here

Pairwise image reconstruction.
The author uses StyleGAN to generate pictures I 1 I_1I1and I 2 I_2I2, randomly sample 32 points in the optical flow area as user input UUU , the goal is to useI 1 I_1I1and UUU heavy structureI 2 I_2I2, and the quantitative results are shown in Table 2.
insert image description here

Ablation experiment
The author compares the influence of different layer features on motion supervision and point tracking. As shown in Figure 3, the sixth block feature of StyleGAN performs best.
insert image description here

4.3 Discussion

Figure 8 shows the effect of the movable region mask.
insert image description here
Figure 9 shows image manipulation of OOD data.
insert image description here

Limitations:
Figure 14a shows some limitations, which are prone to artifacts for some poses that deviate from the distribution of the training set.
As shown in Figure 14b and c, for some processing points lacking structural information information, there will be offsets during tracking.
insert image description here

in conclusion

The authors propose DragGAN, an interactive point-based image editing method that can manipulate images based on user input. This is due to two points:
a. the implicit vector optimization module, which moves the processing point to the target point;
b. the point tracking module accurately tracks the processing point trajectory.
DragGAN surpasses existing GAN-based image manipulation methods, while pioneering new directions, exploiting generative priors for image manipulation.

Guess you like

Origin blog.csdn.net/qq_41994006/article/details/131197182