Parameter amount and calculation amount calculation method of CNN convolutional neural network model (concept version)

1. Reference materials

Detailed explanation of neural network parameter amount, calculation amount (FLOPS), and memory access amount (MAC) calculation
5 ways to obtain Torch network model parameter amount calculation amount and other information
Parameter calculation of convolutional neural network
A brief discussion of deep learning: how to calculate the model and the memory usage of intermediate variables

2. Introduction to parameters and calculations

1. Why do we need to count model parameters and calculations?

  • A good network model not only requires accuracy, but also requires the model's parameter quantity and calculation The quantityis not large, and the abilityis conducive to deployment.
  • The parameter amount and calculation amount of the statistical model can be used for comparative analysis between different network models.
  • Although some models have the same number of parameters,the calculation amount may be different due to different connection methods and structures.

2. The concept of calculation amount and parameter amount

  • The amount of calculation refers to the number of operations that the network model needs to calculate, and the amount of parameters refers to the network model's own Number of parametersHow many.
  • The amount of calculation corresponds toTime complexity, and the amount of parameters corresponds toSpace complexity .
  • The amount of calculation determinesthe length of network execution time, and the amount of parameters determines the amount of video memory occupied .

3. Calculation method of parameter quantity

The picture below is a 32x32x3 input, and a 5x5x3 convolution is used to calculate a certain position. What is calculated here is adot product, so the output is a singlescalar value
Insert image description here

Because the convolution operation is implemented through a sliding window, through the convolution operation, we get an output of 28x28x1.
Insert image description here

If there are 6 filters, then you will get an output of 28x28x6.
Insert image description here

This is the most basic convolution operation, so what are the parameters used here? We only need to add up the parameters of each filter. Of course, don't forget to add the bias: 5x5x3x6 + 6 = 456.

In addition, calculate the size of the output after convolution, as shown in the figure below. N is the size of the input image, F is the size of the filter, and stride is the sliding step.
Insert image description here

However, as can be seen from the last example in the figure above, when stride is greater than 1, it may not be divisible. In this case, a layer of padding needs to be added to the original image, and then calculated using the previous formula.
Insert image description here

In addition, there is a maxpooling operation,This operation will change the input and output, but there will be no parameters. So just use the same formula as calculating convolution.
Insert image description here

4. Mathematical expression of parameter quantities and calculation quantities

1. Convolution layer

Insert image description here

1.1 Parameter quantity

Conv2d(Cin, Cout, K): The parameter amount is Cin × Cout × K × K

For the convolutional layer, the number of parameters is the number of all parameters in the convolution kernel.

Assume that the size of each convolution kernel is D K ∗ D K ∗ M D_K * D_K*M DKDKM, there are N convolution kernels in total, so the parameters of standard convolution are:

If the bias term is considered, the parameter quantity is: D K ∗ D K ∗ M ∗ N + N D_K * D_K * M * N + N DKDKMN+N

If the bias term is not considered, the parameter quantity is: D K ∗ D K ∗ M ∗ N D_K * D_K * M * N DKDKMN

1.2 Calculation amount

For the convolutional layer, the feature maps we obtain are obtained by performing a series of multiplication and addition operations.

Assume that the size of each convolution kernel is D K ∗ D K ∗ M D_K * D_K * M DKDKM, there are N convolution kernels in total, and the output feature map size is D F ∗ D F D_F * D_F DFDF

The number of convolution multiplication operations is: D K ∗ D K ∗ M D_K * D_K * M DKDKM

The number of addition operations for one convolution is: D K ∗ D K ∗ M − 1 (adding 27 numbers, doing 26 addition operations) D_K * D_K * M-1 \ quad \textcolor{red}{(Add 27 numbers, do 26 addition operations)} DKDKM1(27number addition, 做26 Next addition operation)

A total of D F ∗ D F ∗ N convolution operations are performed (the size of the output f e a t u r e m a p is D F ∗ D F ∗ N ) D_F * D_F*N \ convolution operations \quad \ textcolor{red}{(The output feature \ map size is D_F * D_F*N)} DFDFNNext volume calculation (输出feature map大小为DFDFN)

Total number of multiplication and addition operations: ( 2 ∗ D K ∗ D K ∗ M − 1 ) ∗ ( D ​​F ∗ D F ∗ N ) (2*D_K * D_K * M-1) *(D_F * D_F*N) (2DKDKM1)(DFDFN)

Usually, the calculation amount of standard convolution only considers multiplication operations: D K ∗ D K ∗ M ∗ N ∗ D F ∗ D F D_K * D_K * M * N * D_F * D_F < /span>DKDKMNDFDF

1.3 Memory access MAC

输入 D K ∗ D K ∗ M D_K * D_K * M DKDKM

输出 D F ∗ D F ∗ N D_F * D_F * N DFDFN

Strength: D K ∗ D K ∗ M ∗ N D_K * D_K * M * N DKDKMN

Then, the sum of the above three terms is MAC: D K ∗ D K ∗ M + D F ∗ D F ∗ N + D K ∗ D K ∗ M ∗ N D_K * D_K * M + D_F * D_F * N + D_K * D_K * M * N DKDKM+DFDFN+DKDKMN

2. Fully connected layer

Insert image description here

2.1 Parameter quantity

Linear(M->N): The number of parameters is M×N+N

Create, import C i C_i Ci neurons, output C o C_o Coneurons.

If the bias term is considered, the parameter quantity is: C i ∗ C o + C o C_i * C_o + C_o CiCo+Co

If the bias term is not considered, the parameter quantity is: C i ∗ C o C_i * C_o CiCo

2.2 Calculation amount

Create, import C i C_i Ci neurons, output C o C_o Coneurons.

The number of multiplication operations of a neuron is: C i C_i Ci

The number of addition operations of a neuron is: C o C_o Co

The total number of multiplication and addition operations is: ( 2 ∗ C i − 1 ) ∗ C o (2*C_i - 1)*C_o (2Ci1)Co

2.3 Memory access MAC

Import: C i C_i Ci

Output: C o C_o Co

Strength: C i ∗ C o C_i*C_o CiCo

Then, the sum of the above three items is MAC: C i + C o + C i ∗ C o C_i+C_o+C_i*C_o Ci+Co+CiCo

3. BN layer

BatchNorm(N): The parameter size is 2N

参数量: 2 ∗ C i ( b n . w e i g h t + b n . b i a s ) 2*C_i \quad \textcolor{red}{(bn.weight + bn.bias)} 2Ci(bn.we ight+bn.bias)

4. Embedding layer

Embedding(N,W): The number of parameters is N × W

5. Code examples

import torch
import torch.nn as nn
from torchvision import models
 
class MyModel(nn.Module):
    def __init__(self, ):  # input the dim of output fea-map of Resnet:
        super(MyModel, self).__init__()
 
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.gap = nn.AdaptiveAvgPool1d(1)
 
        self.fc = nn.Linear(2048, 512)
 
    def forward(self, input):  # input is 2048!
 
        x = self.conv1(input)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.gap(x)
        x = self.fc(x)
 
        return x
 
##############################
 
# 模型准备
model = MyModel()
 
blank = ' '
print('-----------------------------------------------')
print('|   weight name   |        weight shape       |')
print('-----------------------------------------------')
 
for index, (key, w_variable) in enumerate(model.named_parameters()):
    if len(key)<=15: key = key + (15-len(key))*blank
    w_variable_blank = ''
    if len(w_variable.shape) == 1:
        if w_variable.shape[0] >= 100: w_variable_blank = 8*blank
        else: w_variable_blank = 9*blank
    elif len(w_variable.shape) == 2:
        if w_variable.shape[0] >= 100: w_variable_blank = 2*blank
        else: w_variable_blank = 3*blank
 
    print('| {} | {}{} |'.format(key, w_variable.shape, w_variable_blank))
    key = 0
print('-----------------------------------------------')
 

Output:

-----------------------------------------------
|   weight name   |        weight shape       |
-----------------------------------------------
| conv1.weight    | torch.Size([64, 3, 7, 7]) |
| bn1.weight      | torch.Size([64])          |
| bn1.bias        | torch.Size([64])          |
| fc.weight       | torch.Size([512, 2048])   |
| fc.bias         | torch.Size([512])         |
-----------------------------------------------

explain

  • CNN convolutional layer parameter amount: D K ∗ D K ∗ M ∗ N = 7 ∗ 7 ∗ 3 ∗ 64 D_K * D_K * M * N=7*7*3*64 DKDKMN=77364
  • BN tier reference quantity: 2 ∗ C i = b n . w e i g h t + b n . b i a s = 64 + 64 2*C_i= bn.weight+bn.bias=64+642Ci=bn.wei ght+bn.bias=64+64
  • FC total contact quantity: C i ∗ C o + C o = 2048 ∗ 512 + 512 C_i * C_o + C_o = 2048*512 + 512CiCo+Co=2048512+512

5. Parameter quantities of common network models

LeNet-5

LeNet-5, convolutional neural networks
Gradient-Based Learning Applied to DocumentRecognition

The LeNet-5 network structure is as follows:
Insert image description here

The LeNet-5 network parameters are as follows:

Network layer (operations) enter filter stride padding output Calculation formula Parameter quantity
Input 32x32x1 32x32x1 0
Conv1 32x32x1 5x5x1x6 1 0 28x28x6 5x5x1x6+6 156
MaxPool1 28x28x6 2x2 2 0 14x14x6 0
Conv2 14x14x6 5x5x6x16 1 0 10x10x16 5x5x6x16+16 2416
MaxPool2 10x10x16 2x2 2 0 5x5x16 0
FC1 5x5x16 120 5x5x16x120+120 48120
FC2 120 84 120x84+84 10164
FC3 84 10 84x10+10 850

Total number of parameters: 61706

AlexNet

ImageNet Classification with Deep Convolutional Neural Networks

The AlexNet network structure is as follows:
Insert image description here

The structure diagram of AlexNet is a bit strange. But in fact, it is because the network needs to be split into two GPUs that it is drawn into two layers. The structure of the two layers is the same. The structure in the calculation below is equivalent to the merged network.

AlexNet network parameters are as follows:

Network layer (operations) enter filter stride padding output Calculation formula Parameter quantity
Input 224x224x3 224x224x3 0
Conv1 224x224x3 11x11x3x96 4 0 55x55x96 11x11x3x96+96 34,944
MaxPool1 55x55x96 3x3 2 0 27x27x96 0
Norm1 27x27x96 27x27x96 0
Conv2 27x27x96 5x5x96x256 1 2 27x27x256 5x5x96x256+256 614,656
MaxPool2 27x27x256 3x3 2 0 13x13x256 0
Normal2 13x13x256 13x13x256 0
Conv3 13x13x256 3x3x256x384 1 1 13x13x384 3x3x256x384+384 885,120
Conv4 13x13x384 3x3x384x384 1 1 13x13x384 3x3x384x384+384 1,327,488
Conv5 13x13x384 3x3x384x256 1 1 13x13x256 3x3x384x256+256 884,992
MaxPool3 13x13x256 3x3 2 0 6x6x256 0
FC6 6x6x256 4096 6x6x256x4096+4096 37,752,832
FC7 4096 4096 4096x4096+4096 16,781,312
FC8 4096 1000 4096x1000+1000 4,097,000

Total number of parameters: 62,378,344

VGG-16

Very Deep Convolutional Networks for Large-Scale Image Recognition

The VGG-16 network structure is as follows:
Insert image description here

The VGG-16 network parameters are as follows:

Network layer (operations) enter filter stride padding output Calculation formula Parameter quantity
Input 224x224x3 224x224x3 0
Conv3-64 224x224x3 3x3x3x64 1 1 224x224x64 3x3x3x64 + 64 1,792
Conv3-64 224x224x64 3x3x64x64 1 1 224x224x64 3x3x64x64 + 64 36,928
MaxPool2 224x224x64 2x2 2 0 112x112x64 0
Conv3-128 112x112x64 3x3x64x128 1 1 112x112x128 3x3x64x128 + 128 73,856
Conv3-128 112x112x128 3x3x128x128 1 1 112x112x128 3x3x128x128 + 128 147,584
MaxPool2 112x112x128 2x2 2 0 56x56x128 0
Conv3-256 56x56x128 3x3x128x256 1 1 56x56x256 3x3x128x256 + 256 295,168
Conv3-256 56x56x256 3x3x256x256 1 1 56x56x256 3x3x256x256 + 256 590,080
Conv3-256 56x56x256 3x3x256x256 1 1 56x56x256 3x3x256x256 + 256 590,080
MaxPool2 56x56x256 2x2 2 0 28x28x256 0
Conv3-512 28x28x256 3x3x256x512 1 1 28x28x512 3x3x256x512 + 512 1,180,160
Conv3-512 28x28x512 3x3x512x512 1 1 28x28x512 3x3x512x512 + 512 2,359,808
Conv3-512 28x28x512 3x3x512x512 1 1 28x28x512 3x3x512x512 + 512 2,359,808
MaxPool2 28x28x512 2x2 2 0 14x14x512 0
Conv3-512 14x14x512 3x3x512x512 1 1 14x14x512 3x3x512x512 + 512 2,359,808
Conv3-512 14x14x512 3x3x512x512 1 1 14x14x512 3x3x512x512 + 512 2,359,808
Conv3-512 14x14x512 3x3x512x512 1 1 14x14x512 3x3x512x512 + 512 2,359,808
MaxPool2 14x14x512 2x2 2 0 7x7x512 0
FC1 7x7x512 4096 7x7x512x4096+4096 102,764,544
FC2 4096 4096 4096*4096 + 4096 16,781,312
FC3 4096 1000 4096*1000 + 1000 4,097,000

Total number of parameters: 138,357,544

Guess you like

Origin blog.csdn.net/m0_37605642/article/details/134127563