How to implement a YOLO (v3) object detector from scratch in PyTorch: Part 2

从零开始 PyTorch 项目：YOLO v3 目标检测实现 (第二部分)

https://blog.paperspace.com/
https://blog.paperspace.com/tag/series-yolo/
https://blog.paperspace.com/how-to-implement-a-yolo-v3-object-detector-from-scratch-in-pytorch-part-2/

第二部分：创建 YOLO 网络层级

This is Part 2 of the tutorial on implementing a YOLO v3 detector from scratch. In the last part, I explained how YOLO works, and in this part, we are going to implement the layers used by YOLO in PyTorch. In other words, this is the part where we create the building blocks of our model.
以下是从头实现 YOLO v3 检测器的第二部分教程，我们将基于前面所述的基本概念使用 PyTorch 实现 YOLO 的层级，即创建整个模型的基本构建块。

The code for this tutorial is designed to run on Python 3.5, and PyTorch 0.4. It can be found in it's entirety at this Github repo.
本教程使用的代码需要运行在 Python 3.5 和 PyTorch 0.4 版本之上。你可以在以下链接中找到所有代码：
https://github.com/ayooshkathuria/YOLO_v3_tutorial_from_scratch

This tutorial is broken into 5 parts:
本教程包含五个部分：

Part 1 : Understanding How YOLO works
1. YOLO 的工作原理
Part 2 (This one): Creating the layers of the network architecture
2. 创建 YOLO 网络层
Part 3 : Implementing the the forward pass of the network
3. 实现网络的前向传播
Part 4 : Objectness Confidence Thresholding and Non-maximum Suppression
4. objectness 置信度阈值和非极大值抑制
Part 5 : Designing the input and the output pipelines
5. 设计输入和输出管道

Prerequisites
所需背景知识

Part 1 of the tutorial/knowledge of how YOLO works.
Basic working knowledge of PyTorch, including how to create custom architectures with nn.Module, nn.Sequential and torch.nn.parameter classes.
I assume you have had some experiene with PyTorch before. If you're just starting out, I'd recommend you to play around with the framework a bit before returning to this post.
这一部分要求读者已经基本了解 YOLO 的运行方式和原理，以及关于 PyTorch 的基本知识，例如如何通过 nn.Module、nn.Sequential 和 torch.nn.parameter 等类来构建自定义的神经网络架构。

Getting Started
开始旅程

First create a directory where the code for detector will live.
Then, create a file darknet.py. Darknet is the name of the underlying architecture of YOLO. This file will contain the code that creates the YOLO network. We will supplement it with a file called util.py which will contain the code for various helper functions. Save both of these files in your detector folder. You can use git to keep track of the changes.
首先创建一个存放检测器代码的文件夹，然后再创建 Python 文件 darknet.py. Darknet 是构建 YOLO 底层架构的环境，这个文件将包含实现 YOLO 网络的所有代码。同样我们还需要补充一个名为 util.py 的文件，它会包含多种需要调用的函数。在将所有这些文件保存在检测器文件夹下后，我们就能使用 git 追踪它们的改变。

Configuration File
配置文件

The official code (authored in C) uses a configuration file to build the network. The cfg file describes the layout of the network, block by block. If you're coming from a caffe background, it's equivalent to .protxt file used to describe the network.
We will use the official cfg file, released by the author to build our network. Download it from here and place it in a folder called cfg inside your detector directory. If you're on Linux, cd into your network directory and type:
https://github.com/pjreddie/darknet/blob/master/cfg/yolov3.cfg
官方代码 (authored in C) 使用一个配置文件来构建网络，即 cfg 文件一块块地描述了网络架构。如果你使用过 caffe，那么它就相当于描述网络的 .protxt 文件。
我们将使用官方的 cfg 文件构建网络，它是由 YOLO 的作者发布的。我们可以在以下地址下载，并将其放在检测器目录下的 cfg 文件夹下。当然，如果你使用 Linux，那么就可以先 cd 到检测器网络的目录，然后运行以下命令行。
配置文件下载：https://github.com/pjreddie/darknet/blob/master/cfg/yolov3.cfg

mkdir cfg
cd cfg
wget https://raw.githubusercontent.com/pjreddie/darknet/master/cfg/yolov3.cfg

If you open the configuration file, you will see something like this.
如果你打开配置文件，你将看到如下一些网络架构。

[convolutional]
batch_normalize=1
filters=64
size=3
stride=2
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=32
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

We see 4 blocks above. Out of them, 3 describe convolutional layers, followed by a shortcut layer. A shortcut layer is a skip connection, like the one used in ResNet. There are 5 types of layers that are used in YOLO:
我们看到上面有四块配置，其中 3 个描述了卷积层，最后描述了 ResNet 中常用的 shortcut layer 或跳过连接。下面是 YOLO 中使用的 5 种层级。

Convolutional
卷积层

[convolutional]
batch_normalize=1  
filters=64  
size=3  
stride=1  
pad=1  
activation=leaky

Shortcut
跳过连接

[shortcut]
from=-3
activation=linear

A shortcut layer is a skip connection, akin to the one used in ResNet. The from parameter is -3, which means the output of the shortcut layer is obtained by adding feature maps from the previous and the 3rd layer backwards from the shortcut layer.
跳过连接与残差网络中使用的结构相似，参数 from 为 -3 表示 shortcut layer 的输出会通过将之前层的和之前第三个层的输出的特征图与模块的输入相加而得出。

Upsample
上采样

[upsample]
stride=2

Upsamples the feature map in the previous layer by a factor of stride using bilinear upsampling.
通过参数 stride 在前面层级中双线性上采样特征图。

Route
路由层

[route]
layers = -4

[route]
layers = -1, 61

The route layer deserves a bit of explanation. It has an attribute layers which can have either one, or two values.
When layers attribute has only one value, it outputs the feature maps of the layer indexed by the value. In our example, it is -4, so the layer will output feature map from the 4th layer backwards from the Route layer.
路由层需要一些解释，它的参数 layers 有一个或两个值。当只有一个值时，它输出这一层通过该值索引的特征图。在我们的实验中设置为了 -4，所以层级将输出路由层之前第四个层的特征图。

When layers has two values, it returns the concatenated feature maps of the layers indexed by it's values. In our example it is -1, 61, and the layer will output feature maps from the previous layer (-1) and the 61st layer, concatenated along the depth dimension.
当层级有两个值时，它将返回由这两个值索引的拼接特征图。在我们的实验中为 -1 和 61，因此该层级将输出从前一层级 (-1) 到第 61 层的特征图，并将它们按深度拼接。

YOLO

[yolo]
mask = 0,1,2
anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326
classes=80
num=9
jitter=.3
ignore_thresh = .5
truth_thresh = 1
random=1

YOLO layer corresponds to the Detection layer described in part 1. The anchors describes 9 anchors, but only the anchors which are indexed by attributes of the mask tag are used. Here, the value of mask is 0,1,2, which means the first, second and third anchors are used. This make sense since each cell of the detection layer predicts 3 boxes. In total, we have detection layers at 3 scales, making up for a total of 9 anchors.
YOLO 层级对应于 part 1 所描述的检测层级。参数 anchors 定义了 9 组锚点，但是仅仅使用由 mask 标签属性所索引的锚点。这里，mask 的值为 0、1、2，表示了第一个、第二个和第三个使用的锚点。而掩码表示检测层中的每一个单元预测三个框。总而言之，我们检测层的规模为 3，组成总共 9 个锚点。

Net

[net]
# Testing
batch=1
subdivisions=1
# Training
# batch=64
# subdivisions=16
width= 320
height = 320
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

There's another type of block called net in the cfg, but I wouldn't call it a layer as it only describes information about the network input and training parameters. It isn't used in the forward pass of YOLO. However, it does provide us with information like the network input size, which we use to adjust anchors in the forward pass.
配置文件中存在另一种块 net，不过我不认为它是层，因为它只描述网络输入和训练参数的相关信息，并未用于 YOLO 的前向传播。但是，它为我们提供了网络输入大小等信息，可用于调整前向传播中的锚点。

Parsing the configuration file
解析配置文件

Before we begin, add the necessary imports at the top of the darknet.py file.
在开始之前，我们先在 darknet.py 文件顶部添加必要的导入项。

from __future__ import division

import torch 
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
import numpy as np

We define a function called parse_cfg, which takes the path of the configuration file as the input.
我们定义一个函数 parse_cfg，该函数使用配置文件的路径作为输入。

def parse_cfg(cfgfile):
    """
    Takes a configuration file
    
    Returns a list of blocks. Each blocks describes a block in the neural
    network to be built. Block is represented as a dictionary in the list
    
    """

The idea here is to parse the cfg, and store every block as a dict. The attributes of the blocks and their values are stored as key-value pairs in the dictionary. As we parse through the cfg, we keep appending these dicts, denoted by the variable block in our code, to a list blocks. Our function will return this block.
这里的思路是解析 cfg，将每个块存储为字典。这些块的属性和值都以键值对的形式存储在词典中。解析过程中，我们将这些词典 (由代码中的变量 block 表示) 添加到列表 blocks 中。我们的函数将返回该 block。

We begin by saving the content of the cfg file in a list of strings. The following code performs some preprocessing on this list.
我们首先将配置文件内容保存在字符串列表中。下面的代码对该列表执行预处理。

file = open(cfgfile, 'r')
lines = file.read().split('\n')                        # store the lines in a list
lines = [x for x in lines if len(x) > 0]               # get read of the empty lines 
lines = [x for x in lines if x[0] != '#']              # get rid of comments
lines = [x.rstrip().lstrip() for x in lines]           # get rid of fringe whitespaces

Then, we loop over the resultant list to get blocks.
然后，我们遍历预处理后的列表，得到块。

block = {}
blocks = []

for line in lines:
    if line[0] == "[":           # This marks the start of a new block
        if len(block) != 0:      # If block is not empty, implies it is storing values of previous block.
            blocks.append(block) # add it the blocks list
            block = {}           # re-init the block
        block["type"] = line[1:-1].rstrip()     
    else:
        key,value = line.split("=") 
        block[key.rstrip()] = value.lstrip()
blocks.append(block)

return blocks

Creating the building blocks
创建构建块

Now we are going to use the list returned by the above parse_cfg to construct PyTorch modules for the blocks present in the config file.
现在我们将使用上面 parse_cfg 返回的列表来构建 PyTorch 模块，作为配置文件中的构建块。

We have 5 types of layers in the list (mentioned above). PyTorch provides pre-built layers for types convolutional and upsample. We will have to write our own modules for the rest of the layers by extending the nn.Module class.
列表中有 5 种类型的层。PyTorch 为 convolutional 和 upsample 提供预置层。我们将通过扩展 nn.Module 类为其余层写自己的模块。

The create_modules function takes a list blocks returned by the parse_cfg function.
create_modules 函数使用 parse_cfg 函数返回的 blocks 列表。

def create_modules(blocks):
    net_info = blocks[0]     #Captures the information about the input and pre-processing    
    module_list = nn.ModuleList()
    prev_filters = 3
    output_filters = []

Before we iterate over list of blocks, we define a variable net_info to store information about the network.
在迭代该列表之前，我们先定义变量 net_info，来存储该网络的信息。

nn.ModuleList
Our function will return a nn.ModuleList. This class is almost like a normal list containing nn.Module objects. However, when we add nn.ModuleList as a member of a nn.Module object (i.e. when we add modules to our network), all the parameters of nn.Module objects (modules) inside the nn.ModuleList are added as parameters of the nn.Module object (i.e. our network, which we are adding the nn.ModuleList as a member of) as well.
我们的函数将会返回一个 nn.ModuleList。这个类几乎等同于一个包含 nn.Module 对象的普通列表。然而，当添加 nn.ModuleList 作为 nn.Module 对象的一个成员时 (即当我们添加模块到我们的网络时)，所有 nn.ModuleList 内部的 nn.Module 对象 (模块) 的 parameter 也被添加作为 nn.Module 对象 (即我们的网络，添加 nn.ModuleList 作为其成员)的 parameter。

When we define a new convolutional layer, we must define the dimension of it's kernel. While the height and width of kernel is provided by the cfg file, the depth of the kernel is precisely the number of filters (or depth of the feature map) present in the previous layer. This means we need to keep track of number of filters in the layer on which the convolutional layer is being applied. We use the variable prev_filter to do this. We initialise this to 3, as the image has 3 filters corresponding to the RGB channels.
当我们定义一个新的卷积层时，我们必须定义它的卷积核维度。虽然卷积核的高度和宽度由 cfg 文件提供，但卷积核的深度是由上一层的卷积核数量 (或特征图深度) 决定的。这意味着我们需要持续追踪被应用卷积层的卷积核数量。我们使用变量 prev_filter 来做这件事。我们将其初始化为 3，因为图像有对应 RGB 通道的 3 个通道。

The route layer brings (possibly concatenated) feature maps from previous layers. If there's a convolutional layer right in front of a route layer, then the kernel is applied on the feature maps of previous layers, precisely the ones the route layer brings. Therefore, we need to keep a track of the number of filters in not only the previous layer, but each one of the preceding layers. As we iterate, we append the number of output filters of each block to the list output_filters.
路由层 (route layer) 从前面层得到特征图 (可能是拼接的)。如果在路由层之前有一个卷积层，那么卷积核将被应用到前面层的特征图上，精确来说是路由层得到的特征图。因此，我们不仅需要追踪前一层的卷积核数量，还需要追踪之前每个层。随着不断地迭代，我们将每个模块的输出卷积核数量添加到 output_filters 列表上。

Now, the idea is to iterate over the list of blocks, and create a PyTorch module for each block as we go.
现在，我们的思路是迭代模块的列表，并为每个模块创建一个 PyTorch 模块。

    for index, x in enumerate(blocks[1:]):
        module = nn.Sequential()

        #check the type of block
        #create a new module for the block
        #append to module_list

nn.Sequential class is used to sequentially execute a number of nn.Module objects. If you look at the cfg, you will realize a block may contain more than one layer. For example, a block of type convolutional has a batch norm layer as well as leaky ReLU activation layer in addition to a convolutional layer. We string together these layers using the nn.Sequential and it's the add_module function. For example, this is how we create the convolutional and the upsample layers.
nn.Sequential 类被用于按顺序地执行一系列 nn.Module 对象。如果你查看 cfg 文件，你会发现，一个模块可能包含多于一个层。例如，一个 convolutional 类型的模块有一个批量归一化层、一个 leaky ReLU 激活层以及一个卷积层。我们使用 nn.Sequential 将这些层串联起来，得到 add_module 函数。例如，以下展示了我们如何创建卷积层和上采样层的例子。

        if (x["type"] == "convolutional"):
            #Get the info about the layer
            activation = x["activation"]
            try:
                batch_normalize = int(x["batch_normalize"])
                bias = False
            except:
                batch_normalize = 0
                bias = True

            filters= int(x["filters"])
            padding = int(x["pad"])
            kernel_size = int(x["size"])
            stride = int(x["stride"])

            if padding:
                pad = (kernel_size - 1) // 2
            else:
                pad = 0

            #Add the convolutional layer
            conv = nn.Conv2d(prev_filters, filters, kernel_size, stride, pad, bias = bias)
            module.add_module("conv_{0}".format(index), conv)

            #Add the Batch Norm Layer
            if batch_normalize:
                bn = nn.BatchNorm2d(filters)
                module.add_module("batch_norm_{0}".format(index), bn)

            #Check the activation. 
            #It is either Linear or a Leaky ReLU for YOLO
            if activation == "leaky":
                activn = nn.LeakyReLU(0.1, inplace = True)
                module.add_module("leaky_{0}".format(index), activn)

        #If it's an upsampling layer
        #We use Bilinear2dUpsampling
        elif (x["type"] == "upsample"):
            stride = int(x["stride"])
            upsample = nn.Upsample(scale_factor = 2, mode = "bilinear")
            module.add_module("upsample_{}".format(index), upsample)

Route Layer / Shortcut Layers
路由层 / 捷径层

Next, we write the code for creating the Route and the Shortcut Layers.
接下来，我们来写创建路由层 (Route Layer) 和捷径层 (Shortcut Layer) 的代码。

        #If it is a route layer
        elif (x["type"] == "route"):
            x["layers"] = x["layers"].split(',')
            #Start  of a route
            start = int(x["layers"][0])
            #end, if there exists one.
            try:
                end = int(x["layers"][1])
            except:
                end = 0
            #Positive anotation
            if start > 0: 
                start = start - index
            if end > 0:
                end = end - index
            route = EmptyLayer()
            module.add_module("route_{0}".format(index), route)
            if end < 0:
                filters = output_filters[index + start] + output_filters[index + end]
            else:
                filters= output_filters[index + start]

        #shortcut corresponds to skip connection
        elif x["type"] == "shortcut":
            shortcut = EmptyLayer()
            module.add_module("shortcut_{}".format(index), shortcut)

The code for creating the Route Layer deserves a fair bit of explanation. At first, we extract the the value of the layers attribute, cast it into an integer and store it in a list.
创建路由层的代码需要做一些解释。首先，我们提取关于层属性的值，将其表示为一个整数，并保存在一个列表中。

Then we have a new layer called EmptyLayer which, as the name suggests is just an empty layer.
然后我们得到一个新的称为 EmptyLayer 的层，顾名思义，就是空的层。

route = EmptyLayer()

It is defined as.

class EmptyLayer(nn.Module):
    def __init__(self):
        super(EmptyLayer, self).__init__()

Wait, an empty layer?
等等，一个空的层？

Now, an empty layer might seem weird given it does nothing. The Route Layer, just like any other layer performs an operation (bringing forward previous layer / concatenation). In PyTorch, when we define a new layer, we subclass nn.Module and write the operation the layer performs in the forward function of the nn.Module object.
现在，一个空的层可能会令人困惑，因为它没有做任何事情。而 Route Layer 正如其它层将执行某种操作 (获取之前层 / 拼接)。在 PyTorch 中，当我们定义了一个新的层，我们对 nn.Module 进行子类化，并在 nn.Module 对象的前向函数中写入该层执行的操作。

For designing a layer for the Route block, we will have to build a nn.Module object that is initialized with values of the attribute layers as it's member(s). Then, we can write the code to concatenate/bring forward the feature maps in the forward function. Finally, we then execute this layer in the forward function of our network.
对于在 Route 模块中设计一个层，我们必须建立一个 nn.Module 对象，其作为 layers 的成员被初始化。然后，我们可以写下代码，将 forward 函数中的特征图拼接起来并向前馈送。最后，我们执行网络的某个 forward 函数的这个层。

But given the code of concatenation is fairly short and simple (calling torch.cat on feature maps), designing a layer as above will lead to unnecessary abstraction that just increases boiler plate code. Instead, what we can do is put a dummy layer in place of a proposed route layer, and then perform the concatenation directly in the forward function of the nn.Module object representing darknet. (If the last line doesn't make a lot of sense to you, I suggest you to read how nn.Module class is used in PyTorch. Link at the bottom)
但拼接操作的代码相当地短和简单 (在特征图上调用 torch.cat)，像上述过程那样设计一个层将导致不必要的抽象，增加样板代码。取而代之，我们可以将一个假的层置于之前提出的路由层的位置上，然后直接在代表 darknet 的 nn.Module 对象的 forward 函数中执行拼接运算。(如果感到困惑，我建议你读一下 nn.Module 类在 PyTorch 中的使用)。

The convolutional layer just in front of a route layer applies it's kernel to (possibly concatenated) feature maps from a previous layers. The following code updates the filters variable to hold the number of filters outputted by a route layer.
在路由层之前的卷积层会把它的卷积核应用到之前层的特征图 (可能是拼接的) 上。以下的代码更新 filters 变量以保存路由层输出的卷积核数量。

if end < 0:
    #If we are concatenating maps
    filters = output_filters[index + start] + output_filters[index + end]
else:
    filters= output_filters[index + start]

The shortcut layer also makes use of an empty layer, for it also performs a very simple operation (addition). There is no need to update update the filters variable as it merely adds a feature maps of a previous layer to those of layer just behind.
捷径层也使用空的层，因为它也执行一个非常简单的操作 (加)。没必要更新 filters 变量，因为它只是将前一层的特征图添加到后面的层上而已。

YOLO Layer
Finally, we write the code for creating the YOLO layer.
最后，我们将编写创建 YOLO 层的代码。

        #Yolo is the detection layer
        elif x["type"] == "yolo":
            mask = x["mask"].split(",")
            mask = [int(x) for x in mask]

            anchors = x["anchors"].split(",")
            anchors = [int(a) for a in anchors]
            anchors = [(anchors[i], anchors[i+1]) for i in range(0, len(anchors),2)]
            anchors = [anchors[i] for i in mask]

            detection = DetectionLayer(anchors)
            module.add_module("Detection_{}".format(index), detection)

We define a new layer DetectionLayer that holds the anchors used to detect bounding boxes.
我们定义一个新的层 DetectionLayer 保存用于检测边界框的锚点。

The detection layer is defined as
检测层的定义如下

class DetectionLayer(nn.Module):
    def __init__(self, anchors):
        super(DetectionLayer, self).__init__()
        self.anchors = anchors

At the end of the loop, we do some bookkeeping.
在这个回路结束时，我们做了一些统计 (bookkeeping)。

        module_list.append(module)
        prev_filters = filters
        output_filters.append(filters)

That concludes the body of the loop. At the end of the function create_modules, we return a tuple containing the net_info, and module_list.
这总结了此回路的主体。在 create_modules 函数后，我们获得了包含 net_info 和 module_list 的元组。

return (net_info, module_list)

Testing the code
测试代码

You can test your code by typing the following lines at the end of darknet.py and running the file.
你可以在 darknet.py 后通过输入以下命令行测试代码，运行文件。

blocks = parse_cfg("cfg/yolov3.cfg")
print(create_modules(blocks))

You will see a long list, (exactly containing 106 items), the elements of which will look like
你会看到一个长列表 (确切来说包含 106 条)，其中元素看起来如下所示：

.
.

  (9): Sequential(
     (conv_9): Conv2d (128, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
     (batch_norm_9): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
     (leaky_9): LeakyReLU(0.1, inplace)
   )
   (10): Sequential(
     (conv_10): Conv2d (64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
     (batch_norm_10): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
     (leaky_10): LeakyReLU(0.1, inplace)
   )
   (11): Sequential(
     (shortcut_11): EmptyLayer(
     )
   )
.
.
.

That's it for this part. In this next part, we will assemble the building blocks that we've created to produce output from an image.
https://blog.paperspace.com/how-to-implement-a-yolo-v3-object-detector-from-scratch-in-pytorch-part-3/

Further Reading
PyTorch tutorial
https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html
nn.Module, nn.Parameter classes
https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#define-the-network
nn.ModuleList and nn.Sequential
https://discuss.pytorch.org/t/when-should-i-use-nn-modulelist-and-when-should-i-use-nn-sequential/5463

Ayoosh Kathuria is currently an intern at the Defense Research and Development Organization, India, where he is working on improving object detection in grainy videos. When he's not working, he's either sleeping or playing pink floyd on his guitar. You can connect with him on LinkedIn or look at more of what he does at GitHub.

References
https://blog.paperspace.com/
https://blog.paperspace.com/tag/series-yolo/
https://blog.paperspace.com/how-to-implement-a-yolo-object-detector-in-pytorch/
https://www.jiqizhixin.com/
https://www.linkedin.com/in/ayoosh-kathuria-44a319132/
https://github.com/ayooshkathuria

How to implement a YOLO (v3) object detector from scratch in PyTorch: Part 2

How to implement a YOLO (v3) object detector from scratch in PyTorch: Part 2

猜你喜欢