sinGAN paper notes

Background Knowledge

Inference-based vision theory:

  1. Only based on the image data itself can not provide sufficient constraints on the corresponding object space structure, that is to say, this is a problem of insufficient constraints, so in order to understand the content of the image must have additional constraints (for example, knowing where in the picture Knowledge of a class of objects), using additional information such as additional high-level information, can remove the general constraints of ambiguous interpretation.

  2. Image Prior:
    It is the various attribute information we know about the image, which can be used to reduce the number of feasible solutions.

  3. Image Patch:
    It is the container of the pixel block, which can be compared to the window in the convolution operation. It is often easier to operate the image block in some image manipulation tasks than to operate the entire image.

Motivation


  • Modeling the internal distribution of patches within a single natural image has been long recognized as a powerful prior in many computer vision tasks.
  1. That is to say, for many computer vision tasks, modeling the internal block distribution through a single natural image has always been considered a strong prior.
  2. In order to find equally good representative external block representation information, an external external database with hundreds of sheets is needed.
  3. The internal statistical laws of a picture have a stronger predictive ability than the external statistical laws. Potentially speaking, there are more powerful priors for specific pictures in it.
    First, block information tends to be repeated many times in the same image, which is more repeated than in the external collection of natural images.
    So the author wants to use the information inside a picture to perform image restoration tasks and other image processing applications. Then the author also did it by using sinGAN to produce a lot of high-quality results that can preserve the distribution of block statistics within the training image.

Multi-scale architecture

  • Fully convolutional network (FCN), which can generate images with arbitrary sizes and aspect ratios
  • image retargeting:display images without distortion on different size of screen.

Results

  • The smallest dimension at the coarsest scale is 25px
  • N (number of scales) is limited by the scaling factor r, which is as close to 4/3 as possible
  • For all results, we resize the training images to a maximum dimension of 250px

effect of scales at a test time

In a single test, our multi-scale architecture allows to control the variability of samples by choosing the scale to start generating. insert image description here
From the picture, we can see that if we generate samples from a coarse scale, the result we get is unnatural. If we generate samples from a finer scale, the overall structure will not have unnatural 5 legs , only the difference in stripe level scale will appear.

Effect of scales during training

insert image description here
It can be seen that at a small digital scale, due to the relatively small receptive field, the information we capture is relatively fine, similar to texture information, and at a large digital scale, we can capture global information.

Guess you like

Origin blog.csdn.net/qq_33859479/article/details/103150424