1、torchvision.models.vgg11_bn
from torchsummary import summary
import torch
from torchvision import models
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = models.vgg11_bn(num_classes=2).to(device)
# 打印模型结构
backbone1 = summary(model, (3, 128, 128))
backbone2 = summary(model, (3, 224, 224))
- 当图片输入尺寸为:(3, 224, 224),模型的输出结构如下:
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 64, 224, 224] 1,792
BatchNorm2d-2 [-1, 64, 224, 224] 128
ReLU-3 [-1, 64, 224, 224] 0
MaxPool2d-4 [-1, 64, 112, 112] 0
Conv2d-5 [-1, 128, 112, 112] 73,856
BatchNorm2d-6 [-1, 128, 112, 112] 256
ReLU-7 [-1, 128, 112, 112] 0
MaxPool2d-8 [-1, 128, 56, 56] 0
Conv2d-9 [-1, 256, 56, 56] 295,168
BatchNorm2d-10 [-1, 256, 56, 56] 512
ReLU-11 [-1, 256, 56, 56] 0
Conv2d-12 [-1, 256, 56, 56] 590,080
BatchNorm2d-13 [-1, 256, 56, 56] 512
ReLU-14 [-1, 256, 56, 56] 0
MaxPool2d-15 [-1, 256, 28, 28] 0
Conv2d-16 [-1, 512, 28, 28] 1,180,160
BatchNorm2d-17 [-1, 512, 28, 28] 1,024
ReLU-18 [-1, 512, 28, 28] 0
Conv2d-19 [-1, 512, 28, 28] 2,359,808
BatchNorm2d-20 [-1, 512, 28, 28] 1,024
ReLU-21 [-1, 512, 28, 28] 0
MaxPool2d-22 [-1, 512, 14, 14] 0
Conv2d-23 [-1, 512, 14, 14] 2,359,808
BatchNorm2d-24 [-1, 512, 14, 14] 1,024
ReLU-25 [-1, 512, 14, 14] 0
Conv2d-26 [-1, 512, 14, 14] 2,359,808
BatchNorm2d-27 [-1, 512, 14, 14] 1,024
ReLU-28 [-1, 512, 14, 14] 0
MaxPool2d-29 [-1, 512, 7, 7] 0
AdaptiveAvgPool2d-30 [-1, 512, 7, 7] 0
Linear-31 [-1, 4096] 102,764,544
ReLU-32 [-1, 4096] 0
Dropout-33 [-1, 4096] 0
Linear-34 [-1, 4096] 16,781,312
ReLU-35 [-1, 4096] 0
Dropout-36 [-1, 4096] 0
Linear-37 [-1, 2] 8,194
================================================================
Total params: 128,780,034
Trainable params: 128,780,034
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): 182.02
Params size (MB): 491.26
Estimated Total Size (MB): 673.85
----------------------------------------------------------------
- 当图片输入尺寸为:(3, 128, 128),模型的输出结构如下:
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 64, 128, 128] 1,792
BatchNorm2d-2 [-1, 64, 128, 128] 128
ReLU-3 [-1, 64, 128, 128] 0
MaxPool2d-4 [-1, 64, 64, 64] 0
Conv2d-5 [-1, 128, 64, 64] 73,856
BatchNorm2d-6 [-1, 128, 64, 64] 256
ReLU-7 [-1, 128, 64, 64] 0
MaxPool2d-8 [-1, 128, 32, 32] 0
Conv2d-9 [-1, 256, 32, 32] 295,168
BatchNorm2d-10 [-1, 256, 32, 32] 512
ReLU-11 [-1, 256, 32, 32] 0
Conv2d-12 [-1, 256, 32, 32] 590,080
BatchNorm2d-13 [-1, 256, 32, 32] 512
ReLU-14 [-1, 256, 32, 32] 0
MaxPool2d-15 [-1, 256, 16, 16] 0
Conv2d-16 [-1, 512, 16, 16] 1,180,160
BatchNorm2d-17 [-1, 512, 16, 16] 1,024
ReLU-18 [-1, 512, 16, 16] 0
Conv2d-19 [-1, 512, 16, 16] 2,359,808
BatchNorm2d-20 [-1, 512, 16, 16] 1,024
ReLU-21 [-1, 512, 16, 16] 0
MaxPool2d-22 [-1, 512, 8, 8] 0
Conv2d-23 [-1, 512, 8, 8] 2,359,808
BatchNorm2d-24 [-1, 512, 8, 8] 1,024
ReLU-25 [-1, 512, 8, 8] 0
Conv2d-26 [-1, 512, 8, 8] 2,359,808
BatchNorm2d-27 [-1, 512, 8, 8] 1,024
ReLU-28 [-1, 512, 8, 8] 0
MaxPool2d-29 [-1, 512, 4, 4] 0
AdaptiveAvgPool2d-30 [-1, 512, 7, 7] 0
Linear-31 [-1, 4096] 102,764,544
ReLU-32 [-1, 4096] 0
Dropout-33 [-1, 4096] 0
Linear-34 [-1, 4096] 16,781,312
ReLU-35 [-1, 4096] 0
Dropout- ==**当图片输入尺寸为:(3, 128, 128),模型的输出结构如下:**==
-36 [-1, 4096] 0
Linear-37 [-1, 2] 8,194
================================================================
Total params: 128,780,034
Trainable params: 128,780,034
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.19
Forward/backward pass size (MB): 59.69
Params size (MB): 491.26
Estimated Total Size (MB): 551.14
----------------------------------------------------------------
2、对比发现
- 两种尺寸图片都可以正常训练,224是torch官方使用的尺寸,训练imagenet训练,并且提供了训练权重。当模型属输入尺寸为128时,仍然可以使用预训练权重,但是可以看到MaxPool2d到AdaptiveAvgPool2d时,输出尺寸发生了变化,这是因为AdaptiveAvgPool2d可以动态的调整输入尺寸的大小和stride。可以更好的适应不同的输入出尺寸。
- 在训练模型时,这些层在前向传播过程中可以产生输出,但是在反向传播过程中并不影响梯度的计算。因此,尽管在导出到ONNX格式时可能会遇到一些限制或错误,但模型仍然可以继续通过梯度下降算法进行训练。
3、结论
- 在ONNX导出时,当AdaptiveAvgPool2d的输入尺寸和输出尺寸不对应时,就会提示错误:
raise errors.SymbolicValueError(
torch.onnx.errors.SymbolicValueError: Unsupported: ONNX export of operator adaptive_avg_pool2d, output size that are not factor of input size. Please feel free to request support or submit a pull request on PyTorch GitHub: https://github.com/pytorch/pytorch/issues [Caused by the value '100 defined in (%100 : Long(2, strides=[1], device=cpu) = onnx::Constant[value= 7 7 [ CPULongType{2} ]]()
)' (type 'Tensor') in the TorchScript graph. The containing node has kind 'onnx::Constant'.]
- 所以需要修改AdaptiveAvgPool2d,以正确的导出模型
- 在导出onnx的过程中,很多动态层都不支持,需要改为固定输出
- 在 PyTorch 的模型中,通常在使用
torchsummary
的summary
函数时,如果没有指定 batch size,它会默认使用一个 batch size 为 2 来生成模型的 summary。这是因为在实际训练和推理过程中,通常会使用 mini-batch 处理数据,而选择 batch size 为 2 是一种常见的默认设置。
因此,当你调用 summary(model, (3, 128, 128))
时,torchsummary
库会假定 batch size 为 2,然后将输入尺寸 (3, 128, 128)
传递给模型,以便计算模型的结构和参数数量。
如果你希望使用不同的 batch size,可以在调用 summary
函数时显式指定,例如:
backbone1 = summary(model, input_size=(3, 128, 128), batch_size=4)
通过提供 batch_size
参数,你可以自定义用于生成 summary 的 batch size。这样可以更好地了解模型在不同 batch size 下的行为和参数量。
4、模型修改
from torchsummary import summary
import torch
from torchvision import models
from torch import nn
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = models.vgg11_bn(num_classes=2)
model.avgpool = nn.AdaptiveAvgPool2d((4, 4))
model.classifier = nn.Sequential(
nn.Linear(512 * 4 * 4, 4096),
nn.ReLU(True),
nn.Dropout(p=0.5),
nn.Linear(4096, 4096),
nn.ReLU(True),
nn.Dropout(p=0.5),
nn.Linear(4096, 2),
)
model.to(device) # 模型修改之后,再搞到GPU上,不然报错
# 打印模型结构
backbone1 = summary(model, (3, 128, 128))
backbone2 = summary(model, (3, 224, 224))
- 当图片输入尺寸为:(3, 224, 224),模型的输出结构如下:
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 64, 224, 224] 1,792
BatchNorm2d-2 [-1, 64, 224, 224] 128
ReLU-3 [-1, 64, 224, 224] 0
MaxPool2d-4 [-1, 64, 112, 112] 0
Conv2d-5 [-1, 128, 112, 112] 73,856
BatchNorm2d-6 [-1, 128, 112, 112] 256
ReLU-7 [-1, 128, 112, 112] 0
MaxPool2d-8 [-1, 128, 56, 56] 0
Conv2d-9 [-1, 256, 56, 56] 295,168
BatchNorm2d-10 [-1, 256, 56, 56] 512
ReLU-11 [-1, 256, 56, 56] 0
Conv2d-12 [-1, 256, 56, 56] 590,080
BatchNorm2d-13 [-1, 256, 56, 56] 512
ReLU-14 [-1, 256, 56, 56] 0
MaxPool2d-15 [-1, 256, 28, 28] 0
Conv2d-16 [-1, 512, 28, 28] 1,180,160
BatchNorm2d-17 [-1, 512, 28, 28] 1,024
ReLU-18 [-1, 512, 28, 28] 0
Conv2d-19 [-1, 512, 28, 28] 2,359,808
BatchNorm2d-20 [-1, 512, 28, 28] 1,024
ReLU-21 [-1, 512, 28, 28] 0
MaxPool2d-22 [-1, 512, 14, 14] 0
Conv2d-23 [-1, 512, 14, 14] 2,359,808
BatchNorm2d-24 [-1, 512, 14, 14] 1,024
ReLU-25 [-1, 512, 14, 14] 0
Conv2d-26 [-1, 512, 14, 14] 2,359,808
BatchNorm2d-27 [-1, 512, 14, 14] 1,024
ReLU-28 [-1, 512, 14, 14] 0
MaxPool2d-29 [-1, 512, 7, 7] 0
AdaptiveAvgPool2d-30 [-1, 512, 4, 4] 0
Linear-31 [-1, 4096] 33,558,528
ReLU-32 [-1, 4096] 0
Dropout-33 [-1, 4096] 0
Linear-34 [-1, 4096] 16,781,312
ReLU-35 [-1, 4096] 0
Dropout-36 [-1, 4096] 0
Linear-37 [-1, 2] 8,194
================================================================
Total params: 59,574,018
Trainable params: 59,574,018
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): 181.89
Params size (MB): 227.26
Estimated Total Size (MB): 409.73
----------------------------------------------------------------
- 当图片输入尺寸为:(3, 128, 128),模型的输出结构如下:
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 64, 128, 128] 1,792
BatchNorm2d-2 [-1, 64, 128, 128] 128
ReLU-3 [-1, 64, 128, 128] 0
MaxPool2d-4 [-1, 64, 64, 64] 0
Conv2d-5 [-1, 128, 64, 64] 73,856
BatchNorm2d-6 [-1, 128, 64, 64] 256
ReLU-7 [-1, 128, 64, 64] 0
MaxPool2d-8 [-1, 128, 32, 32] 0
Conv2d-9 [-1, 256, 32, 32] 295,168
BatchNorm2d-10 [-1, 256, 32, 32] 512
ReLU-11 [-1, 256, 32, 32] 0
Conv2d-12 [-1, 256, 32, 32] 590,080
BatchNorm2d-13 [-1, 256, 32, 32] 512
ReLU-14 [-1, 256, 32, 32] 0
MaxPool2d-15 [-1, 256, 16, 16] 0
Conv2d-16 [-1, 512, 16, 16] 1,180,160
BatchNorm2d-17 [-1, 512, 16, 16] 1,024
ReLU-18 [-1, 512, 16, 16] 0
Conv2d-19 [-1, 512, 16, 16] 2,359,808
BatchNorm2d-20 [-1, 512, 16, 16] 1,024
ReLU-21 [-1, 512, 16, 16] 0
MaxPool2d-22 [-1, 512, 8, 8] 0
Conv2d-23 [-1, 512, 8, 8] 2,359,808
BatchNorm2d-24 [-1, 512, 8, 8] 1,024
ReLU-25 [-1, 512, 8, 8] 0
Conv2d-26 [-1, 512, 8, 8] 2,359,808
BatchNorm2d-27 [-1, 512, 8, 8] 1,024
ReLU-28 [-1, 512, 8, 8] 0
MaxPool2d-29 [-1, 512, 4, 4] 0
AdaptiveAvgPool2d-30 [-1, 512, 4, 4] 0
Linear-31 [-1, 4096] 33,558,528
ReLU-32 [-1, 4096] 0
Dropout-33 [-1, 4096] 0
Linear-34 [-1, 4096] 16,781,312
ReLU-35 [-1, 4096] 0
Dropout-36 [-1, 4096] 0
Linear-37 [-1, 2] 8,194
================================================================
Total params: 59,574,018
Trainable params: 59,574,018
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.19
Forward/backward pass size (MB): 59.56
Params size (MB): 227.26
Estimated Total Size (MB): 287.01
----------------------------------------------------------------