YOLOv6 Pro | 用 yaml 文件格式一步步搭建 YOLOv6L 网络结构教程(1)——主干网络篇,一看就懂,分享给纯小白的干货(同样适用于官方的YOLOv5改进)

YOLOv6 Pro 介绍文件链接:YOLOv6 Pro | 使 YOLOv6 构建网络和更换模块更为便捷,助力科研中的网络结构改进,包括Backbone,Neck,DecoupleHead(参考YOLOv5搭建网络的方式)

· YOLOv6 Pro 基于官方 YOLOv6 的整体架构,使用 YOLOv5 的网络构建方式构建一个        YOLOv6 网络,包括 backbone,neck,effidehead 结构。
· 可以在 yaml 文件中任意修改或添加模块,并且每个修改的文件都是独立可运行的,目的是为了助力科研。
· 后续会基于 yolov5 和 yoloair 中的模块加入更多的网络结构改进。
· 预训练权重已经从官方权重转换,确保可以匹配。

· 预先了p6模型(非官方)发布

项目链接:GitHub - yang-0201/YOLOv6_pro: Make it easier for yolov6 to change the network structure

感兴趣的小伙伴们可以点点Star和Fork,有问题可以及时反馈,项目初期,有一些功能意见会进行采纳和开发,也欢迎提PR,项目后续会持续更新,敬请关注! 

进入正题,今天是一篇用 yaml 文件格式一步步搭建 YOLOv6 网络结构的教程,干货满满,希望小伙伴们搭建修改网络都能越来越熟练!!!

(搭建方式同官方yolov5一致,如果想改5的朋友们也同样适用哦)

今天以YOLOV6-L模型为例,yolov6的s,t,n模型的结构和M和L有所不同,主要是构建网络的模块不同,小模型的模块为RepBlocks,大模型为CSPStackRep。

首先我们要熟悉YOLOv6的网络结构,理解了之后才比较方便开始从头搭建,让我们先看看官方YOLOv6构建backbone的代码!话不多说,开肝开肝开肝!!

在 yolov6/models/efficientrep.py 中

class CSPBepBackbone(nn.Module):
    """
    CSPBepBackbone module.
    """
    def __init__(
        self,
        in_channels=3,
        channels_list=None,
        num_repeats=None,
        block=RepVGGBlock,
        csp_e=float(1)/2,
    ):
        super().__init__()

        assert channels_list is not None
        assert num_repeats is not None

        self.stem = block(
            in_channels=in_channels,
            out_channels=channels_list[0],
            kernel_size=3,
            stride=2
        )

        self.ERBlock_2 = nn.Sequential(
            block(
                in_channels=channels_list[0],
                out_channels=channels_list[1],
                kernel_size=3,
                stride=2
            ),
            BepC3(
                in_channels=channels_list[1],
                out_channels=channels_list[1],
                n=num_repeats[1],
                e=csp_e,
                block=block,
            )
        )

        self.ERBlock_3 = nn.Sequential(
            block(
                in_channels=channels_list[1],
                out_channels=channels_list[2],
                kernel_size=3,
                stride=2
            ),
            BepC3(
                in_channels=channels_list[2],
                out_channels=channels_list[2],
                n=num_repeats[2],
                e=csp_e,
                block=block,
            )
        )

        self.ERBlock_4 = nn.Sequential(
            block(
                in_channels=channels_list[2],
                out_channels=channels_list[3],
                kernel_size=3,
                stride=2
            ),
            BepC3(
                in_channels=channels_list[3],
                out_channels=channels_list[3],
                n=num_repeats[3],
                e=csp_e,
                block=block,
            )
        )

        channel_merge_layer = SimSPPF
        if block == ConvWrapper:
            channel_merge_layer = SPPF

        self.ERBlock_5 = nn.Sequential(
            block(
                in_channels=channels_list[3],
                out_channels=channels_list[4],
                kernel_size=3,
                stride=2,
            ),
            BepC3(
                in_channels=channels_list[4],
                out_channels=channels_list[4],
                n=num_repeats[4],
                e=csp_e,
                block=block,
            ),
            channel_merge_layer(
                in_channels=channels_list[4],
                out_channels=channels_list[4],
                kernel_size=5
            )
        )

    def forward(self, x):

        outputs = []
        x = self.stem(x)
        x = self.ERBlock_2(x)
        x = self.ERBlock_3(x)
        outputs.append(x)
        x = self.ERBlock_4(x)
        outputs.append(x)
        x = self.ERBlock_5(x)
        outputs.append(x)

        return tuple(outputs)

 传入的参数为:

backbone=dict(
        type='CSPBepBackbone',
        num_repeats=[1, 6, 12, 18, 6],
        out_channels=[64, 128, 256, 512, 1024],
        csp_e=float(1)/2,
        ),
training_mode = "conv_silu"

主干网络的构成主要由,stem,ERBlock_2,ERBlock_3,ERBlock_4,ERBlock_5(包括SPPF),在每一个ERBlock模块中包含一个block和BepC3模块,通过构建主干网络的代码可以发现有(或者在训练代码中,将conf文件设置为configs/office/yolov6l.py,在调试的过程中就可以看到啦):

block = get_block(config.training_mode)
def get_block(mode):
    if mode == 'repvgg':
        return RepVGGBlock
    elif mode == 'hyper_search':
        return LinearAddBlock
    elif mode == 'repopt':
        return RealVGGBlock
    elif mode == 'conv_relu':
        return SimConvWrapper
    elif mode == 'conv_silu':
        return ConvWrapper
    else:
        raise NotImplementedError("Undefied Repblock choice for mode {}".format(mode))

因此可以得到block就是ConvWrapper模块,再来yolov6/layers/common.py主要看看这个奇怪的ConvWrapper是个什么结构

class ConvWrapper(nn.Module):
    '''Wrapper for normal Conv with SiLU activation'''
    def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, groups=1, bias=True):
        super().__init__()
        self.block = Conv(in_channels, out_channels, kernel_size, stride, groups, bias)

    def forward(self, x):
        return self.block(x)

官方也说了,这不就是一个传统卷积加silu激活函数嘛,和yolo5中的Conv模块其实是一样的,属于YOLOv6的基础卷积模块~

所以简而言之,整体网络由构成为:

stem:ConvWrapper 基础卷积模块

ERBlock_2:ConvWrapper(下采样)+BepC3

ERBlock_3:ConvWrapper(下采样)+BepC3

ERBlock_4:ConvWrapper(下采样)+BepC3

ERBlock_5:ConvWrapper(下采样)+BepC3+SPPF

再来看看通道数和num_block的分布,根据传入的参数可以一一得到

c1代表in_channel,c2代表out_channel,k代表kernel,s代表stride

stem:ConvWrapper 基础卷积模块:  c1: 3, c2: 64, k= 3, s= 2

ERBlock_2:ConvWrapper(下采样): c1: 64, c2: 128, k= 3, s= 2

+BepC3: c1: 128, c2: 128, num_block = 6, csp_e = 0.5, block = ConvWrapper

ERBlock_3:ConvWrapper(下采样): c1: 128, c2: 256, k= 3, s= 2

+BepC3: c1: 256, c2: 256, num_block = 12, csp_e = 0.5, block = ConvWrapper

ERBlock_4:ConvWrapper(下采样): c1: 256, c1: 512, k= 3, s= 2

+BepC3: c1: 512, c2: 512, num_block = 16, csp_e = 0.5, block = ConvWrapper

ERBlock_5:ConvWrapper(下采样): c1: 512, c2: 1024, k= 3, s= 2

+BepC3: c1: 1024, c2: 1024, num_block = 6, csp_e = 0.5, block = ConvWrapper

+SPPF: c1: 1024, c2: 1024,  k= 5

接下来可以看看BepC3到底是个什么结构了:

class BepC3(nn.Module):
    '''Beer-mug RepC3 Block'''
    def __init__(self, in_channels, out_channels, n=1,block=RepVGGBlock, e=0.5, concat=True):  # ch_in, ch_out, number, shortcut, groups, expansion
        super().__init__()
        c_ = int(out_channels * e)  # hidden channels
        self.cv1 = Conv_C3(in_channels, c_, 1, 1)
        self.cv2 = Conv_C3(in_channels, c_, 1, 1)
        self.cv3 = Conv_C3(2 * c_, out_channels, 1, 1)
        if block == ConvWrapper:
            self.cv1 = Conv_C3(in_channels, c_, 1, 1, act=nn.SiLU())
            self.cv2 = Conv_C3(in_channels, c_, 1, 1, act=nn.SiLU())
            self.cv3 = Conv_C3(2 * c_, out_channels, 1, 1, act=nn.SiLU())

        self.m = RepBlock(in_channels=c_, out_channels=c_, n=n, block=BottleRep, basic_block=block)
        self.concat = concat
        if not concat:
            self.cv3 = Conv_C3(c_, out_channels, 1, 1)

    def forward(self, x):
        if self.concat is True:
            return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), dim=1))
        else:
            return self.cv3(self.m(self.cv1(x)))

(a)RepBlock由 N个 RepVGG  blocks+一个ReLU激活函数组成

(b)在推理期间,RepVGG块被转换为RepConv

(c)CSPStackRep模块,也就是BepC3模块,由三个1x1卷积,和N/2 个 两倍的RepBlock 模块组成,还增加了残差链接和concat操作

ok, 现在我们了解所以搭建主干网络的模块和参数信息了,现在可以开始用yaml文件搭建网络啦!!

不熟悉的朋友再介绍下参数含义  [from, number, module, args]

· 第一个参数代表这个模块的输入从哪里来,-1代表从上一层来,同样也可以为2,3

· 第2个参数代表这个模块的堆叠次数,相当于模块的num_block参数,也可以不用默认1

· 第3个参数代表模块名称

· 第4个参数代表传入模块的参数信息,以列表的形式

depth_multiple: 1.0  # model depth multiple
width_multiple: 1.0  # layer channel multiple

depth_multiple,width_multiple分别代表深度系数和宽度系数

width_multiple可以将通道数进行变换,depth_multiple 将模块的num_block数量进行变换


第一个Stem为ConvWrapper 基础卷积模块

那么第一行就为 [-1, 1, ConvWrapper,  [64, 3, 2]]

参数为[64, 3, 2], 后续代码会默认输入通道为上一层的输出通道,所以可以不指定,64代表输出通道,3代表kernel,2代表stride

第二,三行为

[-1, 1, ConvWrapper, [128, 3, 2]],

[-1, 1, BepC3, [128, 6, "ConvWrapper"]],

其中BepC3中,128代表输出通道,6代表num_block, "ConvWrapper"代表block的信息

最终为

depth_multiple: 1.0  # model depth multiple
width_multiple: 1.0  # layer channel multiple

backbone:
  # [from, number, module, args]
  [[-1, 1, ConvWrapper, [64, 3, 2]],  # 0-P1/2
   [-1, 1, ConvWrapper, [128, 3, 2]],  # 1-P2/4
   [-1, 1, BepC3, [128, 6, "ConvWrapper"]],
   [-1, 1, ConvWrapper, [256, 3, 2]],  # 3-P3/8
   [-1, 1, BepC3, [256, 12, "ConvWrapper"]],
   [-1, 1, ConvWrapper, [512, 3, 2]],  # 5-P4/16
   [-1, 1, BepC3, [512, 18, "ConvWrapper"]],
   [-1, 1, ConvWrapper, [1024, 3, 2]],  # 7-P5/32
   [-1, 1, BepC3, [1024, 6, "ConvWrapper"]],
   [-1, 1, SPPF, [1024, 5]]]  # 9

然后第一步,在 yolov6/layers/common.py或者是yolov5的common.py中注册模块的信息,分别把涉及到的所有模块加入,如果是yolov5的话建议新建一个py文件加入

import torch
import torch.nn as nn
from torch.nn.parameter import Parameter
import numpy as np
class RepVGGBlock(nn.Module):
    '''RepVGGBlock is a basic rep-style block, including training and deploy status
    This code is based on https://github.com/DingXiaoH/RepVGG/blob/main/repvgg.py
    '''
    def __init__(self, in_channels, out_channels, kernel_size=3,
                 stride=1, padding=1, dilation=1, groups=1, padding_mode='zeros', deploy=False, use_se=False):
        super(RepVGGBlock, self).__init__()
        """ Initialization of the class.
        Args:
            in_channels (int): Number of channels in the input image
            out_channels (int): Number of channels produced by the convolution
            kernel_size (int or tuple): Size of the convolving kernel
            stride (int or tuple, optional): Stride of the convolution. Default: 1
            padding (int or tuple, optional): Zero-padding added to both sides of
                the input. Default: 1
            dilation (int or tuple, optional): Spacing between kernel elements. Default: 1
            groups (int, optional): Number of blocked connections from input
                channels to output channels. Default: 1
            padding_mode (string, optional): Default: 'zeros'
            deploy: Whether to be deploy status or training status. Default: False
            use_se: Whether to use se. Default: False
        """
        self.deploy = deploy
        self.groups = groups
        self.in_channels = in_channels
        self.out_channels = out_channels

        assert kernel_size == 3
        assert padding == 1

        padding_11 = padding - kernel_size // 2

        self.nonlinearity = nn.ReLU()

        if use_se:
            raise NotImplementedError("se block not supported yet")
        else:
            self.se = nn.Identity()

        if deploy:
            self.rbr_reparam = nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size, stride=stride,
                                         padding=padding, dilation=dilation, groups=groups, bias=True, padding_mode=padding_mode)

        else:
            self.rbr_identity = nn.BatchNorm2d(num_features=in_channels) if out_channels == in_channels and stride == 1 else None
            self.rbr_dense = conv_bn(in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size, stride=stride, padding=padding, groups=groups)
            self.rbr_1x1 = conv_bn(in_channels=in_channels, out_channels=out_channels, kernel_size=1, stride=stride, padding=padding_11, groups=groups)

    def forward(self, inputs):
        '''Forward process'''
        if hasattr(self, 'rbr_reparam'):
            return self.nonlinearity(self.se(self.rbr_reparam(inputs)))

        if self.rbr_identity is None:
            id_out = 0
        else:
            id_out = self.rbr_identity(inputs)

        return self.nonlinearity(self.se(self.rbr_dense(inputs) + self.rbr_1x1(inputs) + id_out))

    def get_equivalent_kernel_bias(self):
        kernel3x3, bias3x3 = self._fuse_bn_tensor(self.rbr_dense)
        kernel1x1, bias1x1 = self._fuse_bn_tensor(self.rbr_1x1)
        kernelid, biasid = self._fuse_bn_tensor(self.rbr_identity)
        return kernel3x3 + self._pad_1x1_to_3x3_tensor(kernel1x1) + kernelid, bias3x3 + bias1x1 + biasid

    def _pad_1x1_to_3x3_tensor(self, kernel1x1):
        if kernel1x1 is None:
            return 0
        else:
            return torch.nn.functional.pad(kernel1x1, [1, 1, 1, 1])

    def _fuse_bn_tensor(self, branch):
        if branch is None:
            return 0, 0
        if isinstance(branch, nn.Sequential):
            kernel = branch.conv.weight
            running_mean = branch.bn.running_mean
            running_var = branch.bn.running_var
            gamma = branch.bn.weight
            beta = branch.bn.bias
            eps = branch.bn.eps
        else:
            assert isinstance(branch, nn.BatchNorm2d)
            if not hasattr(self, 'id_tensor'):
                input_dim = self.in_channels // self.groups
                kernel_value = np.zeros((self.in_channels, input_dim, 3, 3), dtype=np.float32)
                for i in range(self.in_channels):
                    kernel_value[i, i % input_dim, 1, 1] = 1
                self.id_tensor = torch.from_numpy(kernel_value).to(branch.weight.device)
            kernel = self.id_tensor
            running_mean = branch.running_mean
            running_var = branch.running_var
            gamma = branch.weight
            beta = branch.bias
            eps = branch.eps
        std = (running_var + eps).sqrt()
        t = (gamma / std).reshape(-1, 1, 1, 1)
        return kernel * t, beta - running_mean * gamma / std

    def switch_to_deploy(self):
        if hasattr(self, 'rbr_reparam'):
            return
        kernel, bias = self.get_equivalent_kernel_bias()
        self.rbr_reparam = nn.Conv2d(in_channels=self.rbr_dense.conv.in_channels, out_channels=self.rbr_dense.conv.out_channels,
                                     kernel_size=self.rbr_dense.conv.kernel_size, stride=self.rbr_dense.conv.stride,
                                     padding=self.rbr_dense.conv.padding, dilation=self.rbr_dense.conv.dilation, groups=self.rbr_dense.conv.groups, bias=True)
        self.rbr_reparam.weight.data = kernel
        self.rbr_reparam.bias.data = bias
        for para in self.parameters():
            para.detach_()
        self.__delattr__('rbr_dense')
        self.__delattr__('rbr_1x1')
        if hasattr(self, 'rbr_identity'):
            self.__delattr__('rbr_identity')
        if hasattr(self, 'id_tensor'):
            self.__delattr__('id_tensor')
        self.deploy = True
class BepC3(nn.Module):
    '''Beer-mug RepC3 Block'''
    def __init__(self, in_channels, out_channels, n=1,block=RepVGGBlock, e=0.5, concat=True):  # ch_in, ch_out, number, shortcut, groups, expansion
        super().__init__()
        c_ = int(out_channels * e)  # hidden channels
        self.cv1 = Conv_C3(in_channels, c_, 1, 1)
        self.cv2 = Conv_C3(in_channels, c_, 1, 1)
        self.cv3 = Conv_C3(2 * c_, out_channels, 1, 1)
        if block == ConvWrapper:
            self.cv1 = Conv_C3(in_channels, c_, 1, 1, act=nn.SiLU())
            self.cv2 = Conv_C3(in_channels, c_, 1, 1, act=nn.SiLU())
            self.cv3 = Conv_C3(2 * c_, out_channels, 1, 1, act=nn.SiLU())

        self.m = RepBlock(in_channels=c_, out_channels=c_, n=n, block=BottleRep, basic_block=block)
        self.concat = concat
        if not concat:
            self.cv3 = Conv_C3(c_, out_channels, 1, 1)

    def forward(self, x):
        if self.concat is True:
            return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), dim=1))
        else:
            return self.cv3(self.m(self.cv1(x)))
class Conv_C3(nn.Module):
    '''Standard convolution in BepC3-Block'''
    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True):  # ch_in, ch_out, kernel, stride, padding, groups
        super().__init__()
        self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False)
        self.bn = nn.BatchNorm2d(c2)
        self.act = nn.ReLU() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
    def forward(self, x):
        return self.act(self.bn(self.conv(x)))
    def forward_fuse(self, x):
        return self.act(self.conv(x))
class ConvWrapper(nn.Module):
    '''Wrapper for normal Conv with SiLU activation'''
    def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, groups=1, bias=True):
        super().__init__()
        self.block = Conv(in_channels, out_channels, kernel_size, stride, groups, bias)

    def forward(self, x):
        return self.block(x)
class Conv(nn.Module): ##如果是yolov5,需要改一改
    '''Normal Conv with SiLU activation'''
    def __init__(self, in_channels, out_channels, kernel_size, stride, groups=1, bias=False):
        super().__init__()
        padding = kernel_size // 2
        self.conv = nn.Conv2d(
            in_channels,
            out_channels,
            kernel_size=kernel_size,
            stride=stride,
            padding=padding,
            groups=groups,
            bias=bias,
        )
        self.bn = nn.BatchNorm2d(out_channels)
        self.act = nn.SiLU()

    def forward(self, x):
        return self.act(self.bn(self.conv(x)))

    def forward_fuse(self, x):
        return self.act(self.conv(x))
class RepBlock(nn.Module):
    '''
        RepBlock is a stage block with rep-style basic block
    '''
    def __init__(self, in_channels, out_channels, n=1, block=RepVGGBlock, basic_block=RepVGGBlock):
        super().__init__()

        self.conv1 = block(in_channels, out_channels)
        self.block = nn.Sequential(*(block(out_channels, out_channels) for _ in range(n - 1))) if n > 1 else None
        if block == BottleRep:
            self.conv1 = BottleRep(in_channels, out_channels, basic_block=basic_block, weight=True)
            n = n // 2
            self.block = nn.Sequential(*(BottleRep(out_channels, out_channels, basic_block=basic_block, weight=True) for _ in range(n - 1))) if n > 1 else None

    def forward(self, x):
        x = self.conv1(x)
        if self.block is not None:
            x = self.block(x)
        return x
class BottleRep(nn.Module):

    def __init__(self, in_channels, out_channels, basic_block=RepVGGBlock, weight=False):
        super().__init__()
        self.conv1 = basic_block(in_channels, out_channels)
        self.conv2 = basic_block(out_channels, out_channels)
        if in_channels != out_channels:
            self.shortcut = False
        else:
            self.shortcut = True
        if weight:
            self.alpha = Parameter(torch.ones(1))
        else:
            self.alpha = 1.0

    def forward(self, x):
        outputs = self.conv1(x)
        outputs = self.conv2(outputs)
        return outputs + self.alpha * x if self.shortcut else outputs
def conv_bn(in_channels, out_channels, kernel_size, stride, padding, groups=1):
    '''Basic cell for rep-style block, including conv and bn'''
    result = nn.Sequential()
    result.add_module('conv', nn.Conv2d(in_channels=in_channels, out_channels=out_channels,
                                                  kernel_size=kernel_size, stride=stride, padding=padding, groups=groups, bias=False))
    result.add_module('bn', nn.BatchNorm2d(num_features=out_channels))
    return result
def autopad(k, p=None):  # kernel, padding
    # Pad to 'same'
    if p is None:
        p = k // 2 if isinstance(k, int) else [x // 2 for x in k]  # auto-pad
    return p

然后在yolov6/models/yolo.py加入,或在yolov5的yolo.py加入

         elif m in [ConvWrapper]:
            c1 = ch[f]
            c2 = args[0]
            args = [c1, c2, *args[1:]]
         elif m in [BepC3]:
            c1, c2 = ch[f], args[0]
            c2 = make_divisible(c2 * gw, 8)
            args = [c1, c2, *args[1:]]
            if m in [RepBlock]:
                args.insert(2, n)  # number of repeats
                n = 1

至此主干网络搭建完毕,仿佛有些意犹未尽,下篇文章会进行YOLOv6 Rep-PAN的搭建!

猜你喜欢

转载自blog.csdn.net/qq_43000647/article/details/128258692