Global Average Pooling Analysis alternatively the whole connection layer (rpm)

Disclaimer: This article is williamyi original, without permission prohibited reproduced or directly used for other purposes! https://blog.csdn.net/williamyi96/article/details/77530995






There are many ideas benefit people in NIN (Network in Network) this paper, one of which is the global average pooled (Global Average Pooling) for a full analysis of alternative connection layer. Since the beginning of the study after that does not require NIN, and therefore see no papers, looked at other people's blog, find out about the reason it is not clear speak, saying only that it is used. Later, really can not stand, I looked at the papers, but found a better explanation, now considered fully understand the.

First we look at the shortcomings fully connected layers:

All machine learning algorithms based on neural network and probably should be added to the previous AlexNet after convolution layer on layer fully connected feature to quantify, in addition to neural networks to consider for the black box, sometimes a few design fully connected network can also improve the classification performance convolution neural networks, neural networks became the standard used.

However, we also note that there is a very full link layer fatal weakness is the parameter is too large , especially the full connection layer and the final layer is connected to a convolution. Training on the one hand increases the amount of calculation and testing, reducing the rate; the other hand excessive parameters easily overfitting. Although a similar dropout and other means to deal with, but after all dropout is hyper-parameter, beautiful enough is not good practice.

So we have no way to replace it? Of course there is GAP (Global Average Pooling).

We want to clear the whole connection layer does not still have to classify it for each feature map layer after the convolution expanded into a vector, GAP's idea is to these two processes into one, done together. as the picture shows:

Write pictures described here

Thereby it can intuitively explained. Both combined process we can discover the true meaning of GAP is: the whole network to do on the structure regularization prevent over-fitting . Directly removed in a full black box wherein the connecting layer, directly applied in the actual meaning of each of the other channel.

Practice has proved that the effect is quite substantial, GAP can be achieved while the input image of any size. But we should note that the use of gap may cause convergence slows down.

Reference:
Network an In Network
, Ltd. Free Join Average Pooling

Guess you like

Origin blog.csdn.net/monk1992/article/details/90763801