1. Reference materials
Detailed explanation of neural network parameter amount, calculation amount (FLOPS), and memory access amount (MAC) calculation
5 ways to obtain Torch network model parameter amount calculation amount and other information
Parameter calculation of convolutional neural network
A brief discussion of deep learning: how to calculate the model and the memory usage of intermediate variables
2. Introduction to parameters and calculations
1. Why do we need to count model parameters and calculations?
- A good network model not only requires accuracy, but also requires the model's parameter quantity and calculation The quantityis not large, and the abilityis conducive to deployment.
- The parameter amount and calculation amount of the statistical model can be used for comparative analysis between different network models.
- Although some models have the same number of parameters,the calculation amount may be different due to different connection methods and structures.
2. The concept of calculation amount and parameter amount
- The amount of calculation refers to the number of operations that the network model needs to calculate, and the amount of parameters refers to the network model's own Number of parametersHow many.
- The amount of calculation corresponds toTime complexity, and the amount of parameters corresponds toSpace complexity .
- The amount of calculation determinesthe length of network execution time, and the amount of parameters determines the amount of video memory occupied .
3. Calculation method of parameter quantity
The picture below is a 32x32x3 input, and a 5x5x3 convolution is used to calculate a certain position. What is calculated here is adot product, so the output is a singlescalar value。
Because the convolution operation is implemented through a sliding window, through the convolution operation, we get an output of 28x28x1.
If there are 6 filters, then you will get an output of 28x28x6.
This is the most basic convolution operation, so what are the parameters used here? We only need to add up the parameters of each filter. Of course, don't forget to add the bias: 5x5x3x6 + 6 = 456.
In addition, calculate the size of the output after convolution, as shown in the figure below. N is the size of the input image, F is the size of the filter, and stride is the sliding step.
However, as can be seen from the last example in the figure above, when stride is greater than 1, it may not be divisible. In this case, a layer of padding needs to be added to the original image, and then calculated using the previous formula.
In addition, there is a maxpooling operation,This operation will change the input and output, but there will be no parameters. So just use the same formula as calculating convolution.
4. Mathematical expression of parameter quantities and calculation quantities
1. Convolution layer
1.1 Parameter quantity
Conv2d(Cin, Cout, K): The parameter amount is Cin × Cout × K × K
For the convolutional layer, the number of parameters is the number of all parameters in the convolution kernel.
Assume that the size of each convolution kernel is D K ∗ D K ∗ M D_K * D_K*M DK∗DK∗M, there are N convolution kernels in total, so the parameters of standard convolution are:
If the bias term is considered, the parameter quantity is: D K ∗ D K ∗ M ∗ N + N D_K * D_K * M * N + N DK∗DK∗M∗N+N
If the bias term is not considered, the parameter quantity is: D K ∗ D K ∗ M ∗ N D_K * D_K * M * N DK∗DK∗M∗N
1.2 Calculation amount
For the convolutional layer, the feature maps we obtain are obtained by performing a series of multiplication and addition operations.
Assume that the size of each convolution kernel is D K ∗ D K ∗ M D_K * D_K * M DK∗DK∗M, there are N convolution kernels in total, and the output feature map size is D F ∗ D F D_F * D_F DF∗DF。
The number of convolution multiplication operations is: D K ∗ D K ∗ M D_K * D_K * M DK∗DK∗M
The number of addition operations for one convolution is: D K ∗ D K ∗ M − 1 (adding 27 numbers, doing 26 addition operations) D_K * D_K * M-1 \ quad \textcolor{red}{(Add 27 numbers, do 26 addition operations)} DK∗DK∗M−1(27number addition, 做26 Next addition operation)
A total of D F ∗ D F ∗ N convolution operations are performed (the size of the output f e a t u r e m a p is D F ∗ D F ∗ N ) D_F * D_F*N \ convolution operations \quad \ textcolor{red}{(The output feature \ map size is D_F * D_F*N)} DF∗DF∗NNext volume calculation (输出feature map大小为DF∗DF∗N)
Total number of multiplication and addition operations: ( 2 ∗ D K ∗ D K ∗ M − 1 ) ∗ ( D F ∗ D F ∗ N ) (2*D_K * D_K * M-1) *(D_F * D_F*N) (2∗DK∗DK∗M−1)∗(DF∗DF∗N)
Usually, the calculation amount of standard convolution only considers multiplication operations: D K ∗ D K ∗ M ∗ N ∗ D F ∗ D F D_K * D_K * M * N * D_F * D_F < /span>DK∗DK∗M∗N∗DF∗DF。
1.3 Memory access MAC
输入: D K ∗ D K ∗ M D_K * D_K * M DK∗DK∗M
输出: D F ∗ D F ∗ N D_F * D_F * N DF∗DF∗N
Strength: D K ∗ D K ∗ M ∗ N D_K * D_K * M * N DK∗DK∗M∗N
Then, the sum of the above three terms is MAC: D K ∗ D K ∗ M + D F ∗ D F ∗ N + D K ∗ D K ∗ M ∗ N D_K * D_K * M + D_F * D_F * N + D_K * D_K * M * N DK∗DK∗M+DF∗DF∗N+DK∗DK∗M∗N
2. Fully connected layer
2.1 Parameter quantity
Linear(M->N): The number of parameters is M×N+N
Create, import C i C_i Ci neurons, output C o C_o Coneurons.
If the bias term is considered, the parameter quantity is: C i ∗ C o + C o C_i * C_o + C_o Ci∗Co+Co
If the bias term is not considered, the parameter quantity is: C i ∗ C o C_i * C_o Ci∗Co
2.2 Calculation amount
Create, import C i C_i Ci neurons, output C o C_o Coneurons.
The number of multiplication operations of a neuron is: C i C_i Ci
The number of addition operations of a neuron is: C o C_o Co
The total number of multiplication and addition operations is: ( 2 ∗ C i − 1 ) ∗ C o (2*C_i - 1)*C_o (2∗Ci−1)∗Co
2.3 Memory access MAC
Import: C i C_i Ci
Output: C o C_o Co
Strength: C i ∗ C o C_i*C_o Ci∗Co
Then, the sum of the above three items is MAC: C i + C o + C i ∗ C o C_i+C_o+C_i*C_o Ci+Co+Ci∗Co
3. BN layer
BatchNorm(N): The parameter size is 2N
参数量: 2 ∗ C i ( b n . w e i g h t + b n . b i a s ) 2*C_i \quad \textcolor{red}{(bn.weight + bn.bias)} 2∗Ci(bn.we ight+bn.bias)
4. Embedding layer
Embedding(N,W): The number of parameters is N × W
5. Code examples
import torch
import torch.nn as nn
from torchvision import models
class MyModel(nn.Module):
def __init__(self, ): # input the dim of output fea-map of Resnet:
super(MyModel, self).__init__()
self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
self.bn1 = nn.BatchNorm2d(64)
self.relu = nn.ReLU(inplace=True)
self.gap = nn.AdaptiveAvgPool1d(1)
self.fc = nn.Linear(2048, 512)
def forward(self, input): # input is 2048!
x = self.conv1(input)
x = self.bn1(x)
x = self.relu(x)
x = self.gap(x)
x = self.fc(x)
return x
##############################
# 模型准备
model = MyModel()
blank = ' '
print('-----------------------------------------------')
print('| weight name | weight shape |')
print('-----------------------------------------------')
for index, (key, w_variable) in enumerate(model.named_parameters()):
if len(key)<=15: key = key + (15-len(key))*blank
w_variable_blank = ''
if len(w_variable.shape) == 1:
if w_variable.shape[0] >= 100: w_variable_blank = 8*blank
else: w_variable_blank = 9*blank
elif len(w_variable.shape) == 2:
if w_variable.shape[0] >= 100: w_variable_blank = 2*blank
else: w_variable_blank = 3*blank
print('| {} | {}{} |'.format(key, w_variable.shape, w_variable_blank))
key = 0
print('-----------------------------------------------')
Output:
-----------------------------------------------
| weight name | weight shape |
-----------------------------------------------
| conv1.weight | torch.Size([64, 3, 7, 7]) |
| bn1.weight | torch.Size([64]) |
| bn1.bias | torch.Size([64]) |
| fc.weight | torch.Size([512, 2048]) |
| fc.bias | torch.Size([512]) |
-----------------------------------------------
explain
- CNN convolutional layer parameter amount: D K ∗ D K ∗ M ∗ N = 7 ∗ 7 ∗ 3 ∗ 64 D_K * D_K * M * N=7*7*3*64 DK∗DK∗M∗N=7∗7∗3∗64;
- BN tier reference quantity: 2 ∗ C i = b n . w e i g h t + b n . b i a s = 64 + 64 2*C_i= bn.weight+bn.bias=64+642∗Ci=bn.wei ght+bn.bias=64+64;
- FC total contact quantity: C i ∗ C o + C o = 2048 ∗ 512 + 512 C_i * C_o + C_o = 2048*512 + 512Ci∗Co+Co=2048∗512+512。
5. Parameter quantities of common network models
LeNet-5
LeNet-5, convolutional neural networks
Gradient-Based Learning Applied to DocumentRecognition
The LeNet-5 network structure is as follows:
The LeNet-5 network parameters are as follows:
Network layer (operations) | enter | filter | stride | padding | output | Calculation formula | Parameter quantity |
---|---|---|---|---|---|---|---|
Input | 32x32x1 | 32x32x1 | 0 | ||||
Conv1 | 32x32x1 | 5x5x1x6 | 1 | 0 | 28x28x6 | 5x5x1x6+6 | 156 |
MaxPool1 | 28x28x6 | 2x2 | 2 | 0 | 14x14x6 | 0 | |
Conv2 | 14x14x6 | 5x5x6x16 | 1 | 0 | 10x10x16 | 5x5x6x16+16 | 2416 |
MaxPool2 | 10x10x16 | 2x2 | 2 | 0 | 5x5x16 | 0 | |
FC1 | 5x5x16 | 120 | 5x5x16x120+120 | 48120 | |||
FC2 | 120 | 84 | 120x84+84 | 10164 | |||
FC3 | 84 | 10 | 84x10+10 | 850 |
Total number of parameters: 61706
AlexNet
ImageNet Classification with Deep Convolutional Neural Networks
The AlexNet network structure is as follows:
The structure diagram of AlexNet is a bit strange. But in fact, it is because the network needs to be split into two GPUs that it is drawn into two layers. The structure of the two layers is the same. The structure in the calculation below is equivalent to the merged network.
AlexNet network parameters are as follows:
Network layer (operations) | enter | filter | stride | padding | output | Calculation formula | Parameter quantity |
---|---|---|---|---|---|---|---|
Input | 224x224x3 | 224x224x3 | 0 | ||||
Conv1 | 224x224x3 | 11x11x3x96 | 4 | 0 | 55x55x96 | 11x11x3x96+96 | 34,944 |
MaxPool1 | 55x55x96 | 3x3 | 2 | 0 | 27x27x96 | 0 | |
Norm1 | 27x27x96 | 27x27x96 | 0 | ||||
Conv2 | 27x27x96 | 5x5x96x256 | 1 | 2 | 27x27x256 | 5x5x96x256+256 | 614,656 |
MaxPool2 | 27x27x256 | 3x3 | 2 | 0 | 13x13x256 | 0 | |
Normal2 | 13x13x256 | 13x13x256 | 0 | ||||
Conv3 | 13x13x256 | 3x3x256x384 | 1 | 1 | 13x13x384 | 3x3x256x384+384 | 885,120 |
Conv4 | 13x13x384 | 3x3x384x384 | 1 | 1 | 13x13x384 | 3x3x384x384+384 | 1,327,488 |
Conv5 | 13x13x384 | 3x3x384x256 | 1 | 1 | 13x13x256 | 3x3x384x256+256 | 884,992 |
MaxPool3 | 13x13x256 | 3x3 | 2 | 0 | 6x6x256 | 0 | |
FC6 | 6x6x256 | 4096 | 6x6x256x4096+4096 | 37,752,832 | |||
FC7 | 4096 | 4096 | 4096x4096+4096 | 16,781,312 | |||
FC8 | 4096 | 1000 | 4096x1000+1000 | 4,097,000 |
Total number of parameters: 62,378,344
VGG-16
Very Deep Convolutional Networks for Large-Scale Image Recognition
The VGG-16 network structure is as follows:
The VGG-16 network parameters are as follows:
Network layer (operations) | enter | filter | stride | padding | output | Calculation formula | Parameter quantity |
---|---|---|---|---|---|---|---|
Input | 224x224x3 | 224x224x3 | 0 | ||||
Conv3-64 | 224x224x3 | 3x3x3x64 | 1 | 1 | 224x224x64 | 3x3x3x64 + 64 | 1,792 |
Conv3-64 | 224x224x64 | 3x3x64x64 | 1 | 1 | 224x224x64 | 3x3x64x64 + 64 | 36,928 |
MaxPool2 | 224x224x64 | 2x2 | 2 | 0 | 112x112x64 | 0 | |
Conv3-128 | 112x112x64 | 3x3x64x128 | 1 | 1 | 112x112x128 | 3x3x64x128 + 128 | 73,856 |
Conv3-128 | 112x112x128 | 3x3x128x128 | 1 | 1 | 112x112x128 | 3x3x128x128 + 128 | 147,584 |
MaxPool2 | 112x112x128 | 2x2 | 2 | 0 | 56x56x128 | 0 | |
Conv3-256 | 56x56x128 | 3x3x128x256 | 1 | 1 | 56x56x256 | 3x3x128x256 + 256 | 295,168 |
Conv3-256 | 56x56x256 | 3x3x256x256 | 1 | 1 | 56x56x256 | 3x3x256x256 + 256 | 590,080 |
Conv3-256 | 56x56x256 | 3x3x256x256 | 1 | 1 | 56x56x256 | 3x3x256x256 + 256 | 590,080 |
MaxPool2 | 56x56x256 | 2x2 | 2 | 0 | 28x28x256 | 0 | |
Conv3-512 | 28x28x256 | 3x3x256x512 | 1 | 1 | 28x28x512 | 3x3x256x512 + 512 | 1,180,160 |
Conv3-512 | 28x28x512 | 3x3x512x512 | 1 | 1 | 28x28x512 | 3x3x512x512 + 512 | 2,359,808 |
Conv3-512 | 28x28x512 | 3x3x512x512 | 1 | 1 | 28x28x512 | 3x3x512x512 + 512 | 2,359,808 |
MaxPool2 | 28x28x512 | 2x2 | 2 | 0 | 14x14x512 | 0 | |
Conv3-512 | 14x14x512 | 3x3x512x512 | 1 | 1 | 14x14x512 | 3x3x512x512 + 512 | 2,359,808 |
Conv3-512 | 14x14x512 | 3x3x512x512 | 1 | 1 | 14x14x512 | 3x3x512x512 + 512 | 2,359,808 |
Conv3-512 | 14x14x512 | 3x3x512x512 | 1 | 1 | 14x14x512 | 3x3x512x512 + 512 | 2,359,808 |
MaxPool2 | 14x14x512 | 2x2 | 2 | 0 | 7x7x512 | 0 | |
FC1 | 7x7x512 | 4096 | 7x7x512x4096+4096 | 102,764,544 | |||
FC2 | 4096 | 4096 | 4096*4096 + 4096 | 16,781,312 | |||
FC3 | 4096 | 1000 | 4096*1000 + 1000 | 4,097,000 |
Total number of parameters: 138,357,544