What Dropout yes? Why Dropout can prevent over-fitting?

What is the Dropout?

Dropout Chinese meaning: exit. It refers to is made in the training process of the neural network to fit the strategy of preventing too.
According to a certain probability strategies aimed at training process (in general: the hidden layer is 0.5 probability sampling, sampling the input layer is 0.8 probability) random delete neuronal network (except the output layer).

Follows a standard neural network:
Standard neural network
the neural network Dropout:
Here Insert Picture Description
Dropout in training and strategies when using (test) different, randomly delete some neurons during training in the use of the model will be added to all neurons.

Why use Dropout?

Because the depth of the neural network will encounter two major drawbacks in the process of hanging chain:

  1. Easy to over-fitting
  2. Time-consuming

Dropout emergence of fact, in order to solve the above two problems.
Why you can say can solve these two disadvantages?
See below! ! !

Why Dropout can prevent over-fitting?

Dropout may be considered to be practical ways to integrate a large number of deep Bagging neural network.
What is on Bagging ?
On Bagging (on Bootstrap aggregating) technology generalization error is reduced by combining several models.
The main idea is: are several different models of training, and then let all models of voting test sample output. (Similar to integrate a plurality of weak classifiers strong classifier) This strategy is referred to as the average model, using the techniques of the policy is called integrated process.
The average model Why work?
Different models generally do not produce the same error testing machine, a plurality of models to vote makes the error different among different models cancel each other, so as to achieve better results.
Dropout and Bagging difference is:
Bagging all models are independent. (Parameters between the model does not have a mutual influence)
Dropout all models share parameters. (Each child inherited his father's neural network model different subsets)
shared parameters so as to indicate an exponential number of models have become possible in the limited memory space
pull away ~ ~ ~ ~ Dropout why it can prevent over-fitting (to prevent over-fitting phenomenon after the Dropout use)

1. The averaging effect

This idea coincides with Bagging.
We use the same data set to train five different neural networks, usually get five different results, then we can use ** "averaging" or ** strategy "most victorious" to determine the final result . Because of different sub-structure will produce different over-fitting case, averaging may be "the opposite" of you and cancel each other out. So that the whole network to reduce the degree of over-fitting.

2. Reduce the complexity of the relationship between co-adaptability of neurons

Dropout techniques such as a two neurons does not always in one sub-network structure occurs. Based on the Interaction update this weight is not dependent on fixed relationship hidden nodes, preventing certain features to be effective only in the case of other features. Forcing the network to learn more robust features (more has a through-adaptive).

3.Dropout analogous to biological evolution of gender roles

In order to survive species often tend to adapt to the environment, the environment of the mutant species will make it difficult to make timely reflect the emergence of gender-variant can be spawned adapt to the new environment, prevent over-fitting, namely to avoid the species could face the environment changes extinction.

Over-fitting problems are solved, seemingly time-consuming training of the problem has not yet solved, in fact, ah first point has explained the problem.

Dropout averaging is produced after a number of operations substructure parent neural network with N nodes, after the addition Dropout can be seen in the case where the weight constant (parameter sharing) the model number of the amplified exponentially.
The other way round, if you want to train 2 N model parameters, and if not join Dropout, you need to charge 2 N time to train. After addition Dropout N parameters only need training time to obtain 2 N model parameters effect. Is not that save time yet> _>

Reference

[1]:《Deep Learning》Page158-Page165
[2]: https://zhuanlan.zhihu.com/p/38200980
[3]: https://blog.csdn.net/stdcoutzyx/article/details/49022443

Guess you like

Origin blog.csdn.net/qq_19672707/article/details/88740832