【youcans的深度学习 D02】PyTorch例程：创建 LeNet 模型进行图像分类

欢迎关注『youcans的深度学习』系列

【youcans的深度学习 D02】PyTorch例程：创建 LeNet 模型进行图像分类

在前面的章节我们已经介绍了 Pytorch 中的数据加载和模型建立，本节在此基础上，实现一个简单而完整的案例：创建 LeNet 网络模型，使用 CIFAR-10 数据集训练模型，进行图像分类。

1. PyTorch 深度学习建模的基本步骤

使用 PyTorch 建立、训练和使用神经网络模型的基本步骤如下。

准备数据集（Prepare dataset）：准备好训练数据集和测试数据集，可以使用 Pytorch 中的 DataLoader 和 Dataset 对数据进行加载和预处理。
创建网络模型（Design model using Class）：使用 torch.nn 模块中的类来构建模型，可以选择使用已有的模型，如 ResNet、VGG、AlexNet 等，也可以自定义模型。
定义损失函数和优化器（Construct loss and optimizer）：根据分类任务的需要，选择合适的损失函数和优化器。
模型训练（Trainning the model）：将数据输入到模型中进行训练，一般需要进行多个 epoch 的训练。
模型测试（Testing the model）：使用测试数据集来评估训练好的模型的性能，计算模型的准确率等指标。
模型保存与加载（Saving and loading a model）：保存训练好的模型，以便以后使用或部署。
模型推理（Inferring model）：对新的数据，使用训练好的模型预测输出结果。

2. 加载 CIFAR-10 数据集

使用通用的数据集（如 MNIST 或 CIFAR）训练神经网络，不仅可以显著地提高工作效率，而且通常可以获得更好的模型性能。这是由于通用数据集的样本结构均衡、信息高效，而且组织规范、易于处理。

PyTorch 提供了一些常用的图像数据集，预加载在 torchvision.datasets 类中。
torchvision 模块实现神经网络所需的核心类和方法， torchvision.datasets 包含流行的数据集、模型架构和常用的图像转换方法。

CIFAR 数据集是一个经典的图像分类小型数据集，有 CIFAR10 和 CIFAR100 两个版本。CIFAR10 有 10 个类别，CIFAR100 有 100 个类别。CIFAR10 每张图像大小为 32*32，包括飞机、去吃、鸟、，猫、鹿、狗、青蛙、马、船、卡车 10 个类别。CIFAR10 共有 60000张图像，其中训练集 50000张，测试集 10000张。每个类别有 6000张图片，数据集平衡。

加载和使用 CIFAR 数据集的方法为：

torchvision.datasets.CIFAR10()
torchvision.datasets.CIFAR100()

使用 DataLoader 类加载 CIFAR-10 数据集的例程如下。Dataloader 是一个迭代器，基本功能是传入一个 Dataset 对象，根据参数 batch_size 生成一个 batch 的数据。

import torch
from torchvision import transforms

# 定义 transform，将[0,1]的PILImage 转换为[-1,1]的Tensor
transform = transforms.Compose([  # Transform Compose of the image
        transforms.Resize([32,32]),  # 图像大小调整为 (w,h)=(32,32)
        transforms.ToTensor(),  # 将图像转换为张量 Tensor
        transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
    
# 使用 DataLoader 类加载 CIFAR10 训练集
batch_size = 64
# 加载 CIFAR10 数据集, 如果 root 路径加载失败, 则自动在线下载
# 加载 CIFAR10 训练数据集, 50000张训练图片
train_set = torchvision.datasets.CIFAR10(root='../dataset', train=True,
            download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(train_set, batch_size=batch_size,
               shuffle=True, num_workers=2)  # 用 DataLoader 加载数据

# 加载 CIFAR10 验证数据集, 10000张验证图片
valid_set = torchvision.datasets.CIFAR10(root='../dataset', train=False,
            download=True, transform=transform)
valid_loader = torch.utils.data.DataLoader(valid_set, batch_size=5000,
               shuffle=False, num_workers=2)

3. 定义 LeNet-5 模型类

LeNet 是由 Yann Lecun（2018年图灵奖得主）提出的经典的卷积神经网络，最初用于手写字符识别。虽然现在看来这个网络非常简单，性能也很差，但其原理仍然是各种卷积神经网络的基础。

3.1 LeNet 网络

最初的 LeNet 网络采用 5层结构，创造性的引入了卷积神经网络的基本操作。网络结构如下：

输入层为 28×28 的单通道图像。
C1 卷积层：4个 5×5 卷积核，得到 4 个 24×24 特征图。
S1 池化层：平均池化层 2×2，得到 4 个 12×12 特征图。
C2 卷积层：12个 5×5 卷积核，得到 12 个 8×8 特征图。
S2 池化层：平均池化层 2×2，得到 12 个4×4 特征图。
FC 全连接层：全连接隐藏层，使用 sigmoid 函数。

3.2 LeNet-5 网络

1998年提出的 LeNet-5 网络是 LeNet 网络的改进版本。LeNet-5 采用 7层网络结构，包括 3个卷积层、2个池化层和2个全连接层：

输入层为 32×32 的单通道图像。
C1 卷积层：6个 5×5 卷积核，得到 6 个 28×28 特征图。
S2 池化层：最大池化层 2×2，得到 6 个 14×14 特征图。
C3 卷积层：16个 5×5 卷积核，得到 16 个 10×10 特征图。
S4 池化层：最大池化层 2×2，得到 16 个 5×5 特征图。
C5 卷积层：120个 5×5 卷积核，得到 120 个 1×1 特征图。
F6 全连接层：由 84 个神经元组成的全连接隐藏层，使用 sigmoid 函数。
F7 输出层：由10 个神经元组成的 softmax 高斯连接层。

在这里插入图片描述

3.3 定义 LeNet-5 网络模型类

PyTorch 通过 torch.nn 模块提供了高阶的 API，可以从头开始构建网络。

使用 PyTorch 构造神经网络模型，需要运用__call__()和__init__()方法定义模型类 Class。nn.Module 是所有神经网络单元（neural network modules）的基类。

PyTorch在 nn.Module 中实现了__call__()方法，在 __call__() 方法中调用 forward 函数。__init__()方法是类的初始化函数，类似于C++的构造函数。

LeNet 模型类的例程如下：

import torch.nn as nn
import torch.nn.functional as F

class LeNet(nn.Module):
    def __init__(self):  # 构造函数
        super(LeNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 16, 5)  # in_channels, out_channels, kernal_size
        self.pool1 = nn.MaxPool2d(2, 2)  # kernel_size, stride
        self.conv2 = nn.Conv2d(16, 32, 5)
        self.pool2 = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(32 * 5 * 5, 120)  # in_features, out_features
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):  # 正向传播函数
        x = F.relu(self.conv1(x))  # input(3, 32, 32) output(16, 28, 28)
        x = self.pool1(x)          # output(16, 14, 14)
        x = F.relu(self.conv2(x))  # output(32, 10, 10)
        x = self.pool2(x)          # output(32, 5, 5)
        x = x.view(-1, 32*5*5)     # output(32*5*5)
        x = F.relu(self.fc1(x))    # output(120)
        x = F.relu(self.fc2(x))    # output(84)
        x = self.fc3(x)            # output(10)
        return x

3.4 构建网络的图层定义函数

在 LeNet 模型类的例程中，涉及到一些网络模型的图层定义函数。

3.4.1 卷积层

torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode=‘zeros’)

参数说明：

in_channels：int，输入特征矩阵的通道数。如输入 RGB 彩色图像则 in_channels=3。
out_channels：int，卷积层输出特征矩阵的通道数，等于卷积核的数量。
kernel_size：int 或 tuple[int,int]，卷积核的尺寸 (height, width)，int 类型时表示 height=width。
stride：int 或 tuple[int,int]，卷积核的步长，默认为1，int 类型时表示 stride_h=stride_w。
padding：int 或 tuple[int,int]，在输入图像的边缘填充的行数，默认为 0 表示不做边缘填充。

注意事项：

卷积层的输出特征矩阵尺寸计算公式：
$\frac{W-F+2*P}{S} +1$
其中，W 表示输入图片大小为 W*W，F 表示卷积核大小为 F*F，S 为步长，P 为边缘填充行数。

3.4.2 最大池化层

torch.nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)

参数说明：

kernel_size：int 或 tuple[int,int]，卷积核的尺寸 (height, width)，int 类型时表示 height=width。
stride：int 或 tuple[int,int]，卷积核的步长，默认为 1，int 类型时表示 stride_h=stride_w。
padding：int 或 tuple[int,int]，边缘填充，在两侧添加的隐式负无穷大填充，默认为 0。

3.4.3 线性层

torch.nn.Linear(in_features, out_features, bias=True, device=None, dtype=None)

参数说明：

in_channels：int，输入特征向量的维数。
out_channels：int，输出特征向量的维数。
bias：bool，偏移量标志，默认为 True 表示带有偏移量，False 表示不带偏移量。

3.4.4 ReLU 激活函数

torch.nn.ReLU(inplace=False)

应用 ReLU 激活函数：

$ReLU(x) = (x)^+ = max(0, x)$

参数说明：

inplace：bool，就地操作标志，默认为 False。

4. LeNet 模型训练完整例程

4.1 实例化 LeNet 模型

上节已经定义了一个 LeNet 网络模型类。要建立一个 LeNet 模型对象进行训练，包括三个步骤：

实例化 LeNet 模型对象
设置损失函数 Loss
设置优化器 optim

    # (4) 构造 LeNet 网络模型
    model = LeNet()  # 实例化 LeNet 网络模型
    # print(model)  # LeNet(conv1, pool, conv2, fc1, fc2, fc3)
    loss_criterion = nn.CrossEntropyLoss()  # 定义损失函数 CrossEntropy
    optimizer = optim.Adam(model.parameters(), lr=0.001)  # 定义优化器 Adam
    # 优化器对象创建时需要传入模型参数 model.parameters()，将扫描 module中的所有成员

torch.nn.functional 模块包含内置损失函数，交叉熵损失函数为 nn.CrossEntropyLoss。

torch.optim.Adam表示使用 Adam 优化器，注意要将 model 的参数 model.parameters() 传给优化器对象，以便优化器扫描需要优化的参数。

4.2 模型训练

模型训练的基本步骤是：

前馈计算模型的输出值；
计算损失函数值；
计算权重 weight 和偏差 bias 的梯度；
根据梯度值调整模型参数；
将梯度重置为 0（用于下一循环）。

4.3 模型训练例程

完整的使用 PyTorch 导入 CIFAR10 数据集、创建 LeNet-5 网络模型和模型训练的例程如下。

# LeNet_CIFAR_train_1.py
# 构建 LeNet 网络的图像分类模型，使用 CIFAR10 数据集的模型训练
# https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html#
# Crated: [email protected], 2023/04/18

import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

class LeNet(nn.Module):  # # 继承 nn.Module 父类
    def __init__(self):  # 构造函数
        super(LeNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 16, 5)  # in_channels, out_channels, kernal_size
        self.pool1 = nn.MaxPool2d(2, 2)  # kernel_size, stride
        self.conv2 = nn.Conv2d(16, 32, 5)
        self.pool2 = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(32 * 5 * 5, 120)  # in_features, out_features
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):  # 正向传播函数
        x = F.relu(self.conv1(x))  # input(3, 32, 32) output(16, 28, 28)
        x = self.pool1(x)          # output(16, 14, 14)
        x = F.relu(self.conv2(x))  # output(32, 10, 10)
        x = self.pool2(x)          # output(32, 5, 5)
        x = x.view(-1, 32*5*5)     # output(32*5*5)
        x = F.relu(self.fc1(x))    # output(120)
        x = F.relu(self.fc2(x))    # output(84)
        x = self.fc3(x)            # output(10)
        return x

if __name__ == '__main__':
    # (1) 将[0,1]的PILImage 转换为[-1,1]的Tensor
    transform = transforms.Compose([  # Transform Compose of the image
        transforms.Resize([32,32]),  # 图像大小调整为 (w,h)=(32,32)
        transforms.ToTensor(),  # 将图像转换为张量 Tensor
        transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

    # (2) 下载 CIFAR10 训练集
    batch_size = 64
    # 加载 CIFAR10 数据集, 如果 root 路径加载失败, 则自动在线下载
    # 加载 CIFAR10 训练数据集, 50000张训练图片
    train_set = torchvision.datasets.CIFAR10(root='../dataset', train=True,
                                            download=True, transform=transform)
    train_loader = torch.utils.data.DataLoader(train_set, batch_size=batch_size,
                                              shuffle=True, num_workers=2)  # 用 DataLoader 加载数据
    # 加载 CIFAR10 验证数据集, 10000张验证图片
    valid_set = torchvision.datasets.CIFAR10(root='../dataset', train=False,
                                           download=True, transform=transform)
    valid_loader = torch.utils.data.DataLoader(valid_set, batch_size=5000,
                                             shuffle=False, num_workers=2)
    # 创建生成器，用 next 获取一个批次的数据
    valid_data_iter = iter(valid_loader)  # _SingleProcessDataLoaderIter 对象
    valid_image, valid_label = next(valid_data_iter)  # val_image: [batch, 3, 32, 32] val_label: [batch]

    # (3) 定义类别名称 (10个类别)
    classes = ('plane', 'car', 'bird', 'cat', 'deer',
               'dog', 'frog', 'horse', 'ship', 'truck')

    # (4) 构造 LeNet 网络模型
    model = LeNet()  # 实例化 LeNet 网络模型
    # print(model)  # LeNet(conv1, pool, conv2, fc1, fc2, fc3)
    loss_criterion = nn.CrossEntropyLoss()  # 定义损失函数 CrossEntropy
    optimizer = optim.Adam(model.parameters(), lr=0.001)  # 定义优化器 Adam
    # 优化器对象创建时需要传入模型参数 model.parameters()，将扫描 module中的所有成员

    # (5) 用 train_loader 训练 LeNet 网络
    for epoch in range(5):  # 训练轮次 epoch
        running_loss = 0.0  # 每个 epoch 的累加损失值清零
        for step, data in enumerate(train_loader, start=0):  # 加载数据
            inputs, labels = data  # inputs: [batch, 3, 32, 32] labels: [batch]

            optimizer.zero_grad()  # 损失梯度的历史清零
            # forward + backward + optimize
            outputs = model(inputs)  # 前向传播, [batch, 10]
            loss = loss_criterion(outputs, labels)  # 计算损失
            loss.backward()  # 反向传播
            optimizer.step()  # 参数更新

            # print statistics
            running_loss += loss.item()
            if step % 100 == 99:  # 每 100 个 step 打印一次训练信息
                with torch.no_grad():  # 验证过程, 不计算损失函数梯度
                    outputs = model(valid_image)  # 对验证集进行模型推理 [batch, 10]
                    pred_label = torch.max(outputs, dim=1)[1]  # 模型预测的类别 [batch]
                    accuracy = torch.eq(pred_label, valid_label).sum().item() / valid_label.size(0)  # 计算准确率
                    print(
                        "epoch {}, step {}: loss = {:.4f}, accuracy {:.4f}".format(epoch, step, running_loss/100, accuracy))
                    running_loss = 0.0

    print('Finished Training')

    # (6) 保存 LeNet 网络模型
    model_path = "../models/CIFAR10_LeNet_1.pth"
    torch.save(model.state_dict(), model_path)

程序运行结果如下：

Files already downloaded and verified
Files already downloaded and verified
epoch 0, step 99: loss = 1.9706, accuracy 0.3682
epoch 0, step 199: loss = 1.6741, accuracy 0.4254
epoch 0, step 299: loss = 1.5687, accuracy 0.4756
epoch 0, step 399: loss = 1.4513, accuracy 0.4878
epoch 0, step 499: loss = 1.4060, accuracy 0.5300
…
epoch 4, step 99: loss = 0.9303, accuracy 0.6438
epoch 4, step 199: loss = 0.9322, accuracy 0.6526
epoch 4, step 299: loss = 0.9517, accuracy 0.6536
epoch 4, step 399: loss = 0.9251, accuracy 0.6626
epoch 4, step 499: loss = 0.9482, accuracy 0.6588
Finished Training

经过 5 轮训练，使用验证集 10000张图片进行验证，模型准确率为 65.88%。这个准确率当然是很低的，但这并不是本文讨论的重点，后续我们可以使用其它模型和方法来提高分类准确性。

5. LeNet 模型预测完整例程

使用训练好的 LeNet 模型，输入新的图片进行模型推理，由模型输出结果确定输入图片所属的类别。

# LeNet_CIFAR_pred_1.py
# 基于 LeNet 网络的图像分类模型，使用预训练模型进行图像分类
# https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html#
# Crated: [email protected], 2023/04/18

import torch
import torchvision.transforms as transforms
import torch.nn as nn
import torch.nn.functional as F
import numpy as np


class LeNet(nn.Module):
    def __init__(self):    # 初始化函数
        super(LeNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 16, 5)  # in_channels, out_channels, kernal_size
        self.pool1 = nn.MaxPool2d(2, 2)  # kernel_size, stride
        self.conv2 = nn.Conv2d(16, 32, 5)
        self.pool2 = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(32 * 5 * 5, 120)  # in_features, out_features
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):  # 正向传播
        x = F.relu(self.conv1(x))  # input(3, 32, 32) output(16, 28, 28)
        x = self.pool1(x)          # output(16, 14, 14)
        x = F.relu(self.conv2(x))  # output(32, 10, 10)
        x = self.pool2(x)          # output(32, 5, 5)
        x = x.view(-1, 32*5*5)     # output(32*5*5)
        x = F.relu(self.fc1(x))    # output(120)
        x = F.relu(self.fc2(x))    # output(84)
        x = self.fc3(x)            # output(10)
        return x

if __name__ == '__main__':
    # (1) 将[0,1]的PILImage 转换为[-1,1]的Tensor
    transform = transforms.Compose([  # Transform Compose of the image
        transforms.Resize([32,32]),  # 图像大小调整为 (w,h)=(32,32)
        transforms.ToTensor(),  # 将图像转换为张量 Tensor
        transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

    # (2) 定义类别名称 (10个类别)
    classes = ('plane', 'car', 'bird', 'cat', 'deer',
               'dog', 'frog', 'horse', 'ship', 'truck')

    # (3) 加载 LeNet 预训练模型
    model = LeNet()  # 实例化 LeNet 网络模型
    model_path = "../models/CIFAR10_LeNet_1.pth"  # 模型文件路径
    model.load_state_dict(torch.load(model_path))

    # (4) 加载输入图像
    # # 方法 1：直接用 PIL 读取图像，需要转换为 CV 格式才能使用 OpenCV
    from PIL import Image  # 用 PIL 读取图像，需要转换为 CV 格式才能使用 OpenCV
    img = Image.open("../images/img_plane_01.jpg")  # PIL 读取图像文件
    img_t = transform(img)  # 进行预处理变换, torch.Size([3, 32, 32])
    batch_t = torch.unsqueeze(img_t, dim=0)  # 生成批图像 [1,3,32,32]
    print(img_t.shape, batch_t.shape)

    # (5) 模型推理
    with torch.no_grad():
        outputs = model(batch_t)
        predict = torch.max(outputs, dim=1)[1].numpy()  # 模型预测的类别
    label = classes[int(predict)]  # 模型预测的类别名称
    print(classes[int(predict)])

    # (6) 显示图像
    import cv2
    imgCV = cv2.cvtColor(np.asarray(img), cv2.COLOR_RGB2BGR)  # PIL 转换为 CV
    cv2.putText(imgCV, label, (5, 50),  cv2.FONT_HERSHEY_COMPLEX, 2, (100, 20, 255), 2)
    cv2.imshow('image', imgCV)
    cv2.waitKey(0)
    cv2.destroyAllWindows()

程序运行结果如下：

在这里插入图片描述

参考文献:

Yann LeCun, Gradient-based learning applied to document recognition, 1998
https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html#

【本节完】

版权声明：
欢迎关注『youcans的深度学习』系列，转发请注明原文链接：
【youcans的深度学习 D01】PyTorch例程：创建 LeNet 模型进行图像分类(https://youcans.blog.csdn.net/article/details/130245409）
Copyright 2023 youcans, XUPT
Crated：2023-04-18