Generative confrontation network – GAN is an unsupervised algorithm that has been very popular in the past 2 years. It can generate very realistic photos, images and even videos. It will be used in the photo processing software in our mobile phone.
Table of contents
The basic principle of generative confrontation network GAN
The first stage: fix the "discriminator D" and train the "generator G"
The second stage: fix the "generator G" and train the "discriminator D"
Phase 1 and Phase 2 of the cycle
Advantages and disadvantages of GAN
Top 10 Typical GAN Algorithms
13 Practical Applications of GANs
Manual feature extraction - automatic feature extraction
The most special and powerful thing about deep learning is the ability to learn feature extraction by yourself.
The super computing power of the machine can solve many problems that cannot be solved by humans. After automation, the learning ability is stronger and the adaptability is stronger.
Manually judge whether the generated results are good or bad——automatic judgment and optimization
The training set requires a large amount of manually labeled data, which is costly and inefficient. The same is true for the quality of the results generated by manual judgment, which has the problems of high cost and low efficiency.
However, GAN can automatically complete this process and continuously optimize it. This is a very efficient and low-cost method. How is GAN automated ? Let's explain his principle below.
The basic principle of generative confrontation network GAN
vernacular version
There is a very good explanation on Zhihu, which everyone should be able to understand:
Assuming that the law and order in a city is chaotic, there will be countless thieves in the city soon. Among these thieves, some may be master theft, and some may have no skills at all. If the city starts to improve its law and order, a "campaign" against crime is suddenly launched, and the police begin to resume patrolling in the city. Soon, a group of "not skilled" thieves will be caught. The reason why the unskilled thieves were caught was because the police were not skilled enough. After catching a group of low-end thieves, it is hard to say how the city's security level will become, but it is obvious that the city The average level of thieves here has been greatly improved.
The police began to continue training their crime-solving skills and began to catch the more and more cunning thieves. With the arrest of these professional habitual offenders, the police have also developed special skills. They can quickly spot suspicious persons from a group of people, so they go forward to interrogate and finally arrest the suspects; the life of thieves is also difficult. , because the level of the police has been greatly improved, if you still want to behave sneakily like before, you will be caught by the police soon.
non vernacular version
Generative confrontation network (GAN) consists of 2 important parts:
- Generator (Generator ): Data (in most cases, images) is generated by a machine in order to "fool" the discriminator
- Discriminator (Discriminator ): To judge whether this image is real or machine-generated, the purpose is to find out the "fake data" made by the generator
The process is described in detail below:
The first stage: fix the "discriminator D" and train the "generator G"
We use an OK discriminator, let a "generator G" continuously generate "fake data", and then give this "discriminator D" to judge.
In the beginning, "Generator G" was still weak, so it was easy to find out.
However, with continuous training, the skills of "Generator G" continued to improve, and finally deceived "Discriminator D".
At this time, the "discriminator D" is basically in the state of blind guessing, and the probability of judging whether it is fake data is 50%.
The second stage: fix the "generator G" and train the "discriminator D"
When the first stage is passed, there is no point in continuing to train the "generator G". At this time, we fix the "generator G", and then start training the "discriminator D".
"Discriminator D" has improved its discrimination ability through continuous training, and finally he can accurately judge all fake pictures.
By this time, "Generator G" has been unable to fool "Discriminator D".
Phase 1 and Phase 2 of the cycle
Through continuous loops, the capabilities of "Generator G" and "Discriminator D" are getting stronger and stronger.
In the end we got a very good "generator G", we can use it to generate the picture we want.
The following practical application section will show many "amazing" cases.
If you are interested in the detailed technical principles of GAN, you can take a look at the following two articles:
A Beginner's Guide to Generative Adversarial Networks (GAN) – With Code
Advantages and disadvantages of GAN
3 advantages
- Better modeling of data distribution (sharper, clearer images)
- In theory, GANs can train any kind of generator network. Other frameworks require the generator network to have some specific functional form, such as the output layer being Gaussian.
- There is no need to use Markov chains for repeated sampling, no need for inference during the learning process, no complicated variational lower bounds, and avoiding the problem of approximate calculation of difficult probabilities.
2 defects
- Difficult to train, unstable. A good synchronization is required between the generator and the discriminator, but it is easy for D to converge and G to diverge in actual training. D/G training requires careful design.
- Mode Collapse problem. During the learning process of GANs, there may be missing patterns, and the generator begins to degenerate, always generating the same sample points, and cannot continue to learn.
Top 10 Typical GAN Algorithms
There are hundreds of GAN algorithms, and everyone’s research on GAN is increasing exponentially. Currently, there are hundreds of forums about confrontation networks every month.
The figure below shows the number of papers published on GAN each month:
If you are interested in the GANs algorithm, you can view almost all algorithms in the " GANs Zoo ". We have selected 10 representative algorithms from many algorithms for you, and technical personnel can read his papers and codes.
algorithm | paper | the code |
---|---|---|
HOWEVER | Paper address | code address |
DCGAN | Paper address | code address |
CGAN | Paper address | code address |
CycleGAN | Paper address | code address |
CoGAN | Paper address | code address |
ProGAN | Paper address | code address |
WRONG | Paper address | code address |
SAGAN | Paper address | code address |
BigGAN | Paper address | code address |
The above content is compiled from the original text of " Generative Adversarial Networks – The Story So Far ". There are some rough descriptions of the algorithm. If you are interested, you can take a look.
13 Practical Applications of GANs
GAN does not seem as intuitive as "speech recognition" and "text mining". But his application has entered our lives. Here are some practical applications of GAN.
Generate an image dataset
The training of artificial intelligence requires a large amount of data sets. If all of them are collected and labeled manually, the cost will be very high. GAN can automatically generate some data sets and provide low-cost training data.
generate face photo
Generating face photos is an application that everyone is familiar with, but what to do with the generated photos is a question that needs to be considered. Because this kind of face photo is still on the verge of law.
Generate photos, cartoon characters
GAN can generate not only human faces, but also other types of photos, and even comic characters.
Image to Image Conversion
Simply put, it is to convert one form of image into another form of image, just like adding a filter. For example:
- Convert drafts to photos
- Convert satellite photos to images for Google Maps
- Convert photos into oil paintings
- turn day into night
Text to Image Conversion
In particular their StackGAN generates photorealistic photos from text descriptions of simple objects like birds and flowers.
Semantic-Image-Photo Conversion
In a 2017 paper titled " High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs ," it was demonstrated to use conditional GANs to generate realistic images given a semantic image or sketch as input.
Automatically generate models
In a 2017 paper titled " Pose-Guided Human Image Generation ," mannequins can be automatically generated, with new poses.
Photos to Emojis
GANs can automatically generate corresponding expressions (Emojis) from face photos.
photo editing
Specific photos can be generated using GANs, such as changing hair color, changing facial expressions, or even changing gender.
Predict looks at different ages
Given a photo of a face, GAN can help you predict what you will look like at different ages.
Increase photo resolution to make photos clearer
Give GAN a photo, and he can generate a higher-resolution photo, making the photo clearer.
photo restoration
If there is a problem with an area of the photo (such as being painted or erased), GAN can repair this area and restore it to its original state.
Automatically generate 3D models
Given multiple 2D images from different angles, a 3D model can be generated.
Generative Adversarial Networks (GAN, Generative Adversarial Networks)
It is a deep learning model and one of the most promising methods for unsupervised learning on complex distributions in recent years. The model produces quite good output through the mutual game learning of (at least) two modules in the framework: Generative Model and Discriminative Model . In the original GAN theory, both G and D are not required to be neural networks, but only need to be able to fit the corresponding generation and discrimination functions. However, in practice, deep neural networks are generally used as G and D. An excellent GAN application requires a good training method, otherwise the output may be unsatisfactory due to the freedom of the neural network model.
Generative Adversarial Networks (GANs) are a class of artificial intelligence algorithms for unsupervised machine learning, implemented by two neural network systems competing against each other in a zero-sum game framework. They were introduced by Ian Goodfellow et al. In 2014 this technique could generate photos that looked at least superficially real to the observer of the person, with many realistic features (although the person in the test could actually be told in many cases).