DragGAN

In the magical world of AIGC, we can change and synthesize the image we want by "dragging" on the image. For example, to make a lion turn its head and open its mouth:

The research to achieve this effect comes from the "Drag Your GAN" paper led by a Chinese author, which was released last month and has been accepted by the SIGGRAPH 2023 conference.

More than a month has passed, and the research team recently released the official code. In just three days, the number of Stars has exceeded 23k, which shows how popular it is.

GitHub address: https://github.com/XingangPan/DragGAN

Coincidentally, another similar research——DragDiffusion has come into people's attention today. The previous DragGAN realized point-based interactive image editing and achieved pixel-level precision editing effects. But there are also shortcomings. DragGAN is based on the generation confrontation network (GAN), and its versatility will be limited by the capacity of the pre-trained GAN model.

In the new study, several researchers from the National University of Singapore and Bytedance extended this editing framework to the diffusion model and proposed DragDiffusion. Using a large-scale pre-trained diffusion model, they greatly improved the applicability of point-based interactive editing in real-world scenarios.

While most current diffusion-based image editing methods are suitable for text embeddings, DragDiffusion optimizes the diffusion latent representation for precise spatial control.

  • Paper address: https://arxiv.org/pdf/2306.14435.pdf

  • Project address: https://yujun-shi.github.io/projects/dragdiffusion.html

The researchers said that the diffusion model generates images in an iterative manner, and the "one-step" optimization of the diffusion latent representation is sufficient to generate coherent results, enabling DragDiffusion to efficiently complete high-quality editing.

They conduct extensive experiments under various challenging scenarios (e.g., multiple objects, different object categories), verifying the plasticity and generality of DragDiffusion. The relevant code will also be released soon,

Let's see how DragDiffusion works.

First of all, we want to raise the head of the kitten in the picture below a little bit more. The user only needs to drag the red point to the blue point:

Next, we want to make the mountain a little higher, there is no problem, just drag the red key point: 

 I also want to turn the head of the sculpture, just drag and drop it:

Let the flowers on the shore bloom in a wider range:

method introduction

DRAGDIFFUSION proposed in this paper aims to optimize specific diffusion latent variables for interactive, point-based image editing.

To achieve this goal, the study first fine-tunes LoRA based on the diffusion model to reconstruct user input images. Doing so can ensure that the style of the input and output images remains consistent.

Next, we apply DDIM inversion (a method that explores the inverse transformation and latent space operations of diffusion models) to the input image to obtain step-specific diffusion latent variables.

During the editing process, we iteratively apply action supervision and point tracking to optimize the previously obtained t-th diffusion latent variables to "drag" the content of the processed point to the target location. The editing process also applies a regularization term to ensure that unmasked regions of the image remain unchanged. whaosoft  aiot  http://143ai.com  

Finally, the optimized t-th step latent variables are denoised by DDIM to obtain the edited results. The overall overview diagram is as follows:

Experimental results

Given an input image, DRAGDIFFUSION "drags" the content of key points (red) to corresponding target points (blue). For example, in picture (1), the dog’s head is turned around, in picture (7), the tiger’s mouth is closed, and so on.

Below are some more sample demonstrations. As shown in Figure (4), the mountain peak will become higher, Figure (7) will increase the size of the pen, and so on. 

Guess you like

Origin blog.csdn.net/qq_29788741/article/details/131434812