[Can run] VGG network reproduction, a must-see for getting started with image binary classification

foreword

Hi, fellow deep learning players.

The blogger is a third-year student. Driven by curiosity, he started hands-on deep learning in August last year. At first, he was really annoyed. He couldn’t read the thesis, the experiment couldn’t work, he didn’t understand the internal principles, and he always suffered from the lack of a suitable blog. guidance.

This blog is not only my memory of a milestone in my scientific research, but also I hope to help readers enter the door of deep learning faster. Let's work together in 2023!

1. Image set enhancement preprocessing

The transforms.Compose() function is to combine some transforms together; each transform has its own corresponding function.

transform = transforms.Compose([
    transforms.Resize(100),
    transforms.RandomVerticalFlip(),
    transforms.RandomCrop(50),
    transforms.RandomResizedCrop(150),
    transforms.ColorJitter(brightness=0.5, contrast=0.5, hue=0.5),
    transforms.ToTensor(),
    transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
])

Transforms.Compose common preprocessing operations:

  • transform.Resize() resizes the given image

  • transform.RandomVerticalFlip() horizontally flips the incoming image with p=0.5 to facilitate better recognition in the later stage

  • transforms.RandomCrop(50) Immediately take a 50*50 image area, the meaning is the same as the previous sentence

  • transforms.RandomResizedCrop(150) Immediately take a 150*150 image area, the meaning of this operation is: even if it is only a part of the object, we think it is this type of object

  • transforms.ColorJitter(brightness=0.5, contrast=0.5, hue=0.5) Change the properties of the image: brightness (brightness), contrast (contrast), saturation (saturation) and hue (hue)

  • transforms.ToTensor(): converts a nump.ndarray or img with a shape of (H, W, C) to a tensor with a shape of (C, H, W), which normalizes each value to [0,1] , the normalization method is relatively simple, just divide by 255 directly.

  • transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5]): Its function is to normalize the input to (0,1) first, and then use the formula "(x-mean)/std", Distribute each element to (-1,1)

Many codes are like this:
torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
How did this set of values ​​come from?
This set of values ​​is sampled from the Imagenet training set.

2. Define the data reading interface

An important interface for data reading in PyTorch is torch.utils.data.DataLoader, which is mainly used to read the output of custom data interface. The following code is used to set the position of my train set and test set

dataset_train = datasets.ImageFolder('train集位置', transform)
print(dataset_train.imgs) #输出经过transform处理的图片
print(dataset_train.class_to_idx) #输出此图片所在位置

dataset_test = datasets.ImageFolder('test集位置', transform)
print(dataset_test.class_to_idx)

train_loader = torch.utils.data.DataLoader(dataset_train, batch_size=BATCH_SIZE,
                                           shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset_train, batch_size=BATCH_SIZE,
                                          shuffle=True)

torch.utils.data.DataLoader parameters:

  • dataset (Dataset) – The dataset to load the data into.
  • batch_size (int, optional) – how many samples to load per batch (default: 1).
  • shuffle (bool, optional) – set to True to shuffle the data every epoch (default: False).
  • sampler (Sampler, optional) – defines the strategy for extracting samples from the data set, that is, the way to generate the index, which can be in order or out of order
  • num_workers (int, optional) – How many subprocesses to use to load data. 0 means the data will be loaded in the main process (default: 0)
  • collate_fn (callable, optional) – Merge a batch of data and labels.
  • pin_memory (bool, optional) – Setting pin_memory=True means that the generated Tensor data initially belongs to the page-locked memory in the memory, so that it will be faster to escape the Tensor in the memory to the video memory of the GPU.
  • drop_last (bool, optional) – If the dataset size is not divisible by the batch size, set to True to drop the last incomplete batch. If set to False and the size of the dataset is not divisible by batch size, the last batch will be smaller. (Default: False)
  • timeout is used to set the timeout period for data reading, but if the data is not read beyond this time, an error will be reported.

3. Data Loader

torchvision.datasets.ImageFolder is a generic data loader where images are arranged in this way by default:

root/dog/xxx.png
root/dog/xxy.png
root/dog/[...]/xxz.png

root/cat/123.png
root/cat/nsdf3.png
root/cat/[...]/asd932_.png
dataset=torchvision.datasets.ImageFolder(
                       root, transform=None, 
                       target_transform=None, 
                       loader=<function default_loader>, 
                       is_valid_file=None)
dataset_train = datasets.ImageFolder('\\train', transform)

torchvision.datasets.ImageFolder parameters:

  • root: The root directory of image storage, that is, the upper level directory of the directory where the folders of each category are located.
  • transform: The operation (function) to preprocess the image, the original image is used as input, and a transformed image is returned.
  • target_transform: The operation of preprocessing the image category, the input is target, and the output is its transformation. If this parameter is not passed, that is, no conversion is performed on the target, and the returned sequence index is 0, 1, 2...
  • loader: Indicates the loading method of the dataset, usually the default loading method is sufficient.
  • is_valid_file: A function to get the path of an image file and check if the file is a valid file (used to check for corrupted files)
  • The returned dataset has the following two attributes:
    self.class_to_idx: the index corresponding to the category, which corresponds to the target returned without any conversion
    self.imgs: the list of (img-path, class) tuples saved

Fourth, define the network backbone

This part was the most headache for me at the beginning. I first learned several libraries of python, pandas, matplotlib, etc., and then started reading papers directly, but I didn’t learn anything about object-oriented python, and then I couldn’t read other people’s source code. I understand, now I find that my previous problems still have not laid a solid foundation, which has led to my later thinking ahead of practice, and laying a solid foundation is the first step!

insert image description here
Look at the model picture first, what a classic shape.

'''
我们的网络将识别图像。我们将使用PyTorch中内置的一个称为卷积的过程。卷积将图像的每个元素添加到它的本地邻居中,由一个内核或一个小矩阵加权,帮助我们从输入图像中提取某些特征(如边缘检测、清晰度、模糊度等)。

定义你的模型的Net类有两个要求。第一个是写一个引用nn.Module的__init__函数。这个函数是你定义神经网络中全连接层的地方。

我们已经完成了对神经网络的定义,现在我们必须定义我们的数据将如何通过它。

指定数据将如何通过你的模型:
当你使用PyTorch建立一个模型时,你只需要定义forward函数,它将把数据传入计算图(即我们的神经网络)。这将代表我们的前馈算法。

你可以在forward函数中使用任何张量操作。
'''
class VGG16(nn.Module):
    def __init__(self, num_classes=1000):
        super(VGG16, self).__init__()
        # 第一段卷积层
    self.conv1 = nn.Sequential(
        nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, stride=1, padding=1),
        nn.ReLU(inplace=True),
        nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, stride=1, padding=1),
        nn.ReLU(inplace=True),
        nn.MaxPool2d(kernel_size=2, stride=2)
    )
    
    # 第二段卷积层
    self.conv2 = nn.Sequential(
        nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, stride=1, padding=1),
        nn.ReLU(inplace=True),
        nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, stride=1, padding=1),
        nn.ReLU(inplace=True),
        nn.MaxPool2d(kernel_size=2, stride=2)
    )
    
    # 第三段卷积层
    self.conv3 = nn.Sequential(
        nn.Conv2d(in_channels=128, out_channels=256, kernel_size=3, stride=1, padding=1),
        nn.ReLU(inplace=True),
        nn.Conv2d(in_channels=256, out_channels=256, kernel_size=3, stride=1, padding=1),
        nn.ReLU(inplace=True),
        nn.Conv2d(in_channels=256, out_channels=256, kernel_size=3, stride=1, padding=1),
        nn.ReLU(inplace=True),
        nn.MaxPool2d(kernel_size=2, stride=2)
    )
    
    # 第四段卷积层
    self.conv4 = nn.Sequential(
        nn.Conv2d(in_channels=256, out_channels=512, kernel_size=3, stride=1, padding=1),
        nn.ReLU(inplace=True),
        nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, stride=1, padding=1),
        nn.ReLU(inplace=True),
        nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, stride=1, padding=1),
        nn.ReLU(inplace=True),
        nn.MaxPool2d(kernel_size=2, stride=2)
    )
    
    # 第五段卷积层
    self.conv5 = nn.Sequential(
        nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, stride=1, padding=1),
        nn.ReLU(inplace=True),
        nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, stride=1, padding=1),
        nn.ReLU(inplace=True),
        nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, stride=1, padding=1),
        nn.ReLU(inplace=True),
        nn.MaxPool2d(kernel_size=2, stride=2)
    )
    
    # 全连接层
    self.fc = nn.Sequential(
        nn.Linear(in_features=512*7*7, out_features=4096),
        nn.ReLU(inplace=True),
        nn.Dropout(),
        nn.Linear(in_features=4096, out_features=4096),
        nn.ReLU(inplace=True),
        nn.Dropout(),
        nn.Linear(in_features=4096, out_features=num_classes)
    )
    
def forward(self, x):
    x = self.conv1(x)
    x = self.conv2(x)
    x = self.conv3(x)
    x = self.conv4(x)
    x = self.conv5(x)
    x = self.fc(x)
    return x


Let me explain nn.Sequential one by one for the functions used above : a sequential container. Modules will be added to it in the order passed in the constructor. Alternatively, a module's OrderedDict can be passed in. Sequential's forward() method takes any input and forwards it to the first module it contains. It then "concatenates" the output of each subsequent module with the input in order, finally returning the output of the last module.
The value Sequential provides compared to manually calling a sequence of modules is that it allows the entire container to be treated as a single module, such that transformations performed on Sequential apply to each module it stores (which are all a registered child of Sequential module).
What is the difference between Sequential and Torch.nn.ModuleList? ModuleList is exactly what it sounds like - a list for storing modules! On the other hand, the layers in Sequential are connected in cascade.
nn.Conv2d:
torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros', device=None, dtype=None)

  • Everyone should be able to understand the number of input and output channels
  • kernel_size refers to the size of the convolution kernel
  • stride controls the stride of the cross-correlation, a single number or a tuple.
  • padding controls the amount of padding applied to the input. It can be a string {'valid', 'same'} or an int/tuple of integers giving the amount of implicit padding to apply on both sides.
  • dilation controls the spacing between kernel points; also known as the à trous algorithm. Hard to describe, but this link does a good job of visualizing what dilation does.
  • groups control the connections between inputs and outputs. Both in_channels and out_channels must be divisible by groups.

nn.ReLU is a nonlinear activation function. The activation function means that in a multi-layer neural network, there is a functional relationship between the output of the upper layer neurons and the input of the lower layer neurons. This function is the activation function. The neurons in the upper layer are weighted and summed to obtain the output value, and then an activation function is applied to obtain the input value of the next layer. The purpose of introducing the activation function is to increase the nonlinear fitting ability of the neural network.
nn.MaxPool2d : Maximize the feature points in the neighborhood, reduce the error of the estimated mean deviation caused by the parameter error of the convolution layer, and retain more texture information.
nn.Linear : It is used to set the fully connected layer in the network. It should be noted that in the task of two-dimensional image processing, the input and output of the fully connected layer are generally set as two-dimensional tensors, and the shape is usually [batch_size, size] , unlike the convolutional layer that requires input and output to be four-dimensional tensors.
nn.Dropout : It is set to prevent over-fitting.
It can only be used in the training part and not in the test part. It is generally used after the fully connected neural network mapping layer.
n.Dropout (p = 0.3) represents each neuron There is a 0.3 possibility of not being activated.
Network structure: Refer to the VGGNet paper and reproduce VGG16

insert image description here

5. Learning rate and optimizer

optimizer = optim.Adam(model.parameters(), lr=modellr) #定义优化器
def adjust_learning_rate(optimizer, epoch):
    """Sets the learning rate to the initial LR decayed by 10 every 30 epochs"""
    modellrnew = modellr * (0.1 ** (epoch // 5)) #更新学习率
    print("lr:", modellrnew)
    for param_group in optimizer.param_groups:
        param_group['lr'] = modellrnew #存储学习率

params : The learnable parameters that need to be updated in the model
lr : Learning rate
Adam : It can adjust different learning rates for each different parameter, update frequently changing parameters with smaller steps, and sparse parameters with Update with a larger step size.
Features:
1. Combining the advantages of Adagrad being good at dealing with sparse gradients and RMSprop being good at dealing with non-stationary targets;
2. Smaller memory requirements;
3. Computing different adaptive learning rates for different parameters;
4. Also applicable to most non-stationary Convex optimization - suitable for large datasets and high dimensional spaces.

6. Training process

def train(model, device, train_loader, optimizer, epoch):

    model.train()#模型训练

    for batch_idx, (data, target) in enumerate(train_loader): #循环参数更新
        data, target = data.to(device), target.to(device).float().unsqueeze(1)
        optimizer.zero_grad()
        output = model(data)
        # print(output)

        loss = F.binary_cross_entropy(output, target)
        loss.backward()
        optimizer.step()

        if (batch_idx + 1) % 10 == 0: #每一个周期后输出训练结果
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, (batch_idx + 1) * len(data), len(train_loader.dataset),
                       100. * (batch_idx + 1) / len(train_loader), loss.item()))

loss.backward() :
PyTorch's backpropagation (tensor.backward()) is implemented through the autograd package, which automatically calculates its corresponding gradient based on the mathematical operations performed by the tensor.
If no backward() is performed, the gradient value will be None, so loss.backward() should be written before optimizer.step().

optimizer.step() :
The function of the optimizer is to update the network parameters according to the calculated parameter gradient, so in order to make the optimizer work, two things are mainly needed: the optimizer needs to know the parameter space optimization of
the current network model
The processor needs to know the gradient information of backpropagation (that is, the information calculated by backward)

7. Test process

def val(model, device, test_loader):
    model.eval()
    test_loss = 0
    correct = 0

    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device).float().unsqueeze(1)
            output = model(data)
            # print(output)
            test_loss += F.binary_cross_entropy(output, target, reduction='mean').item()
            pred = torch.tensor([[1] if num[0] >= 0.5 else [0] for num in output]).to(device)
            correct += pred.eq(target.long()).sum().item()

        print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
            test_loss, correct, len(test_loader.dataset),
            100. * correct / len(test_loader.dataset)))

binary_cross_entropy :
This loss function is very classic, and I used it in my first project experiment.
insert image description here
In the above formula, xi represents the true probability distribution of the i-th sample, yi is the probability distribution predicted by the model, xi represents the number of possible events, and n represents the total number of events in the data set. Cross-entropy can be decomposed into two parts: the first part is the entropy of xi, which represents the uncertainty of the real distribution; the second part is the KL dispersion between xi and yi, which represents the distance between two distributions. The smaller the cross entropy, the larger the probability distribution, which means the better the classification result.

epilogue

After reorganizing, I realized how bad my writing was back then. Even now, it’s not so clear. I just understand it literally. It’s like adding ingredients to a stew. I don’t know what the ingredients are for. , I only know that others add it. Maybe I am too shallow as an undergraduate. I need to go to the company to settle it. Maybe after filtering, I will know where the impurities are now. This article is for myself in the past six months. We have gone through the epidemic , 2023 will be better!

Guess you like

Origin blog.csdn.net/weixin_53415043/article/details/130043902