VGG network structure

The basic structure of VGG network

 

As shown in the figure, the depth of the network is gradually increased from A to E, there are 11 weight layers in A (8 convolutional layers, 3 fully connected layers), and 19 weight layers in E (16 Convolutional layer, 3 fully connected layers), the width of the convolutional layer is very small. At the beginning, there are only 64 filters in the first convolutional layer. After the maximum pooling, the number of filters is doubled, and the final There are 512 convolutional layers, the size of the filter used in VGG is 3x3, the stride of convolution is 1, and the padding of space filling is also 1. There are a total of 5 maximum pooling layers, and the maximum pooling pool is used. The size of the pooling area is 2x2, and the stride is also 2, which is non-overlapping pooling.

network structure analysis

Due to the use of pixel padding, after the filter is convolved with the image, the resolution of the image remains unchanged, and more spatial information can be maintained. Only after the maximum pooling, the resolution of the image will be attenuated. Secondly, 3x3 convolution is used in the entire VGGNet, which reduces the number of parameters, because 2 3x3 convolutions are equivalent to a 5x5 convolution, and 3 3x3 convolutions are equivalent to a 7x7 convolution, Then the number of parameters of 2 3x3 convolutions is 18, the number of parameters of 1 5x5 convolution is 25, the parameters are significantly reduced, and secondly, 1x1 convolution is used here, which is equivalent to performing a linear transformation on the input, but A nonlinear activation function is generally introduced after the convolution, so it is equivalent to adding a nonlinear transformation. In addition, replacing a large convolution with multiple smaller convolutions can also provide more nonlinear transformations.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324857502&siteId=291194637