YOLOv6 Pro 介绍文件链接:YOLOv6 Pro | 使 YOLOv6 构建网络和更换模块更为便捷,助力科研中的网络结构改进,包括Backbone,Neck,DecoupleHead(参考YOLOv5搭建网络的方式)
· YOLOv6 Pro 基于官方 YOLOv6 的整体架构,使用 YOLOv5 的网络构建方式构建一个 YOLOv6 网络,包括 backbone,neck,effidehead 结构。
· 可以在 yaml 文件中任意修改或添加模块,并且每个修改的文件都是独立可运行的,目的是为了助力科研。
· 后续会基于 yolov5 和 yoloair 中的模块加入更多的网络结构改进。
· 预训练权重已经从官方权重转换,确保可以匹配。· 预先了p6模型(非官方)发布
项目链接:GitHub - yang-0201/YOLOv6_pro: Make it easier for yolov6 to change the network structure
感兴趣的小伙伴们可以点点Star和Fork,有问题可以及时反馈,项目初期,有一些功能意见会进行采纳和开发,也欢迎提PR,项目后续会持续更新,敬请关注!
进入正题,今天是一篇用 yaml 文件格式一步步搭建 YOLOv6 网络结构的教程,干货满满,希望小伙伴们搭建修改网络都能越来越熟练!!!
(搭建方式同官方yolov5一致,如果想改5的朋友们也同样适用哦)
今天以YOLOV6-L模型为例,yolov6的s,t,n模型的结构和M和L有所不同,主要是构建网络的模块不同,小模型的模块为RepBlocks,大模型为CSPStackRep。
首先我们要熟悉YOLOv6的网络结构,理解了之后才比较方便开始从头搭建,让我们先看看官方YOLOv6构建backbone的代码!话不多说,开肝开肝开肝!!
在 yolov6/models/efficientrep.py 中
class CSPBepBackbone(nn.Module):
"""
CSPBepBackbone module.
"""
def __init__(
self,
in_channels=3,
channels_list=None,
num_repeats=None,
block=RepVGGBlock,
csp_e=float(1)/2,
):
super().__init__()
assert channels_list is not None
assert num_repeats is not None
self.stem = block(
in_channels=in_channels,
out_channels=channels_list[0],
kernel_size=3,
stride=2
)
self.ERBlock_2 = nn.Sequential(
block(
in_channels=channels_list[0],
out_channels=channels_list[1],
kernel_size=3,
stride=2
),
BepC3(
in_channels=channels_list[1],
out_channels=channels_list[1],
n=num_repeats[1],
e=csp_e,
block=block,
)
)
self.ERBlock_3 = nn.Sequential(
block(
in_channels=channels_list[1],
out_channels=channels_list[2],
kernel_size=3,
stride=2
),
BepC3(
in_channels=channels_list[2],
out_channels=channels_list[2],
n=num_repeats[2],
e=csp_e,
block=block,
)
)
self.ERBlock_4 = nn.Sequential(
block(
in_channels=channels_list[2],
out_channels=channels_list[3],
kernel_size=3,
stride=2
),
BepC3(
in_channels=channels_list[3],
out_channels=channels_list[3],
n=num_repeats[3],
e=csp_e,
block=block,
)
)
channel_merge_layer = SimSPPF
if block == ConvWrapper:
channel_merge_layer = SPPF
self.ERBlock_5 = nn.Sequential(
block(
in_channels=channels_list[3],
out_channels=channels_list[4],
kernel_size=3,
stride=2,
),
BepC3(
in_channels=channels_list[4],
out_channels=channels_list[4],
n=num_repeats[4],
e=csp_e,
block=block,
),
channel_merge_layer(
in_channels=channels_list[4],
out_channels=channels_list[4],
kernel_size=5
)
)
def forward(self, x):
outputs = []
x = self.stem(x)
x = self.ERBlock_2(x)
x = self.ERBlock_3(x)
outputs.append(x)
x = self.ERBlock_4(x)
outputs.append(x)
x = self.ERBlock_5(x)
outputs.append(x)
return tuple(outputs)
传入的参数为:
backbone=dict(
type='CSPBepBackbone',
num_repeats=[1, 6, 12, 18, 6],
out_channels=[64, 128, 256, 512, 1024],
csp_e=float(1)/2,
),
training_mode = "conv_silu"
主干网络的构成主要由,stem,ERBlock_2,ERBlock_3,ERBlock_4,ERBlock_5(包括SPPF),在每一个ERBlock模块中包含一个block和BepC3模块,通过构建主干网络的代码可以发现有(或者在训练代码中,将conf文件设置为configs/office/yolov6l.py,在调试的过程中就可以看到啦):
block = get_block(config.training_mode)
def get_block(mode):
if mode == 'repvgg':
return RepVGGBlock
elif mode == 'hyper_search':
return LinearAddBlock
elif mode == 'repopt':
return RealVGGBlock
elif mode == 'conv_relu':
return SimConvWrapper
elif mode == 'conv_silu':
return ConvWrapper
else:
raise NotImplementedError("Undefied Repblock choice for mode {}".format(mode))
因此可以得到block就是ConvWrapper模块,再来yolov6/layers/common.py主要看看这个奇怪的ConvWrapper是个什么结构
class ConvWrapper(nn.Module):
'''Wrapper for normal Conv with SiLU activation'''
def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, groups=1, bias=True):
super().__init__()
self.block = Conv(in_channels, out_channels, kernel_size, stride, groups, bias)
def forward(self, x):
return self.block(x)
官方也说了,这不就是一个传统卷积加silu激活函数嘛,和yolo5中的Conv模块其实是一样的,属于YOLOv6的基础卷积模块~
所以简而言之,整体网络由构成为:
stem:ConvWrapper 基础卷积模块
ERBlock_2:ConvWrapper(下采样)+BepC3
ERBlock_3:ConvWrapper(下采样)+BepC3
ERBlock_4:ConvWrapper(下采样)+BepC3
ERBlock_5:ConvWrapper(下采样)+BepC3+SPPF
再来看看通道数和num_block的分布,根据传入的参数可以一一得到
c1代表in_channel,c2代表out_channel,k代表kernel,s代表stride
stem:ConvWrapper 基础卷积模块: c1: 3, c2: 64, k= 3, s= 2
ERBlock_2:ConvWrapper(下采样): c1: 64, c2: 128, k= 3, s= 2
+BepC3: c1: 128, c2: 128, num_block = 6, csp_e = 0.5, block = ConvWrapper
ERBlock_3:ConvWrapper(下采样): c1: 128, c2: 256, k= 3, s= 2
+BepC3: c1: 256, c2: 256, num_block = 12, csp_e = 0.5, block = ConvWrapper
ERBlock_4:ConvWrapper(下采样): c1: 256, c1: 512, k= 3, s= 2
+BepC3: c1: 512, c2: 512, num_block = 16, csp_e = 0.5, block = ConvWrapper
ERBlock_5:ConvWrapper(下采样): c1: 512, c2: 1024, k= 3, s= 2
+BepC3: c1: 1024, c2: 1024, num_block = 6, csp_e = 0.5, block = ConvWrapper
+SPPF: c1: 1024, c2: 1024, k= 5
接下来可以看看BepC3到底是个什么结构了:
class BepC3(nn.Module):
'''Beer-mug RepC3 Block'''
def __init__(self, in_channels, out_channels, n=1,block=RepVGGBlock, e=0.5, concat=True): # ch_in, ch_out, number, shortcut, groups, expansion
super().__init__()
c_ = int(out_channels * e) # hidden channels
self.cv1 = Conv_C3(in_channels, c_, 1, 1)
self.cv2 = Conv_C3(in_channels, c_, 1, 1)
self.cv3 = Conv_C3(2 * c_, out_channels, 1, 1)
if block == ConvWrapper:
self.cv1 = Conv_C3(in_channels, c_, 1, 1, act=nn.SiLU())
self.cv2 = Conv_C3(in_channels, c_, 1, 1, act=nn.SiLU())
self.cv3 = Conv_C3(2 * c_, out_channels, 1, 1, act=nn.SiLU())
self.m = RepBlock(in_channels=c_, out_channels=c_, n=n, block=BottleRep, basic_block=block)
self.concat = concat
if not concat:
self.cv3 = Conv_C3(c_, out_channels, 1, 1)
def forward(self, x):
if self.concat is True:
return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), dim=1))
else:
return self.cv3(self.m(self.cv1(x)))
(a)RepBlock由 N个 RepVGG blocks+一个ReLU激活函数组成
(b)在推理期间,RepVGG块被转换为RepConv
(c)CSPStackRep模块,也就是BepC3模块,由三个1x1卷积,和N/2 个 两倍的RepBlock 模块组成,还增加了残差链接和concat操作
ok, 现在我们了解所以搭建主干网络的模块和参数信息了,现在可以开始用yaml文件搭建网络啦!!
不熟悉的朋友再介绍下参数含义 [from, number, module, args]
· 第一个参数代表这个模块的输入从哪里来,-1代表从上一层来,同样也可以为2,3
· 第2个参数代表这个模块的堆叠次数,相当于模块的num_block参数,也可以不用默认1
· 第3个参数代表模块名称
· 第4个参数代表传入模块的参数信息,以列表的形式
depth_multiple: 1.0 # model depth multiple width_multiple: 1.0 # layer channel multiple
depth_multiple,width_multiple分别代表深度系数和宽度系数
width_multiple可以将通道数进行变换,depth_multiple 将模块的num_block数量进行变换
第一个Stem为ConvWrapper 基础卷积模块
那么第一行就为 [-1, 1, ConvWrapper, [64, 3, 2]]
参数为[64, 3, 2], 后续代码会默认输入通道为上一层的输出通道,所以可以不指定,64代表输出通道,3代表kernel,2代表stride
第二,三行为
[-1, 1, ConvWrapper, [128, 3, 2]],
[-1, 1, BepC3, [128, 6, "ConvWrapper"]],
其中BepC3中,128代表输出通道,6代表num_block, "ConvWrapper"代表block的信息
最终为
depth_multiple: 1.0 # model depth multiple
width_multiple: 1.0 # layer channel multiple
backbone:
# [from, number, module, args]
[[-1, 1, ConvWrapper, [64, 3, 2]], # 0-P1/2
[-1, 1, ConvWrapper, [128, 3, 2]], # 1-P2/4
[-1, 1, BepC3, [128, 6, "ConvWrapper"]],
[-1, 1, ConvWrapper, [256, 3, 2]], # 3-P3/8
[-1, 1, BepC3, [256, 12, "ConvWrapper"]],
[-1, 1, ConvWrapper, [512, 3, 2]], # 5-P4/16
[-1, 1, BepC3, [512, 18, "ConvWrapper"]],
[-1, 1, ConvWrapper, [1024, 3, 2]], # 7-P5/32
[-1, 1, BepC3, [1024, 6, "ConvWrapper"]],
[-1, 1, SPPF, [1024, 5]]] # 9
然后第一步,在 yolov6/layers/common.py或者是yolov5的common.py中注册模块的信息,分别把涉及到的所有模块加入,如果是yolov5的话建议新建一个py文件加入
import torch
import torch.nn as nn
from torch.nn.parameter import Parameter
import numpy as np
class RepVGGBlock(nn.Module):
'''RepVGGBlock is a basic rep-style block, including training and deploy status
This code is based on https://github.com/DingXiaoH/RepVGG/blob/main/repvgg.py
'''
def __init__(self, in_channels, out_channels, kernel_size=3,
stride=1, padding=1, dilation=1, groups=1, padding_mode='zeros', deploy=False, use_se=False):
super(RepVGGBlock, self).__init__()
""" Initialization of the class.
Args:
in_channels (int): Number of channels in the input image
out_channels (int): Number of channels produced by the convolution
kernel_size (int or tuple): Size of the convolving kernel
stride (int or tuple, optional): Stride of the convolution. Default: 1
padding (int or tuple, optional): Zero-padding added to both sides of
the input. Default: 1
dilation (int or tuple, optional): Spacing between kernel elements. Default: 1
groups (int, optional): Number of blocked connections from input
channels to output channels. Default: 1
padding_mode (string, optional): Default: 'zeros'
deploy: Whether to be deploy status or training status. Default: False
use_se: Whether to use se. Default: False
"""
self.deploy = deploy
self.groups = groups
self.in_channels = in_channels
self.out_channels = out_channels
assert kernel_size == 3
assert padding == 1
padding_11 = padding - kernel_size // 2
self.nonlinearity = nn.ReLU()
if use_se:
raise NotImplementedError("se block not supported yet")
else:
self.se = nn.Identity()
if deploy:
self.rbr_reparam = nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size, stride=stride,
padding=padding, dilation=dilation, groups=groups, bias=True, padding_mode=padding_mode)
else:
self.rbr_identity = nn.BatchNorm2d(num_features=in_channels) if out_channels == in_channels and stride == 1 else None
self.rbr_dense = conv_bn(in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size, stride=stride, padding=padding, groups=groups)
self.rbr_1x1 = conv_bn(in_channels=in_channels, out_channels=out_channels, kernel_size=1, stride=stride, padding=padding_11, groups=groups)
def forward(self, inputs):
'''Forward process'''
if hasattr(self, 'rbr_reparam'):
return self.nonlinearity(self.se(self.rbr_reparam(inputs)))
if self.rbr_identity is None:
id_out = 0
else:
id_out = self.rbr_identity(inputs)
return self.nonlinearity(self.se(self.rbr_dense(inputs) + self.rbr_1x1(inputs) + id_out))
def get_equivalent_kernel_bias(self):
kernel3x3, bias3x3 = self._fuse_bn_tensor(self.rbr_dense)
kernel1x1, bias1x1 = self._fuse_bn_tensor(self.rbr_1x1)
kernelid, biasid = self._fuse_bn_tensor(self.rbr_identity)
return kernel3x3 + self._pad_1x1_to_3x3_tensor(kernel1x1) + kernelid, bias3x3 + bias1x1 + biasid
def _pad_1x1_to_3x3_tensor(self, kernel1x1):
if kernel1x1 is None:
return 0
else:
return torch.nn.functional.pad(kernel1x1, [1, 1, 1, 1])
def _fuse_bn_tensor(self, branch):
if branch is None:
return 0, 0
if isinstance(branch, nn.Sequential):
kernel = branch.conv.weight
running_mean = branch.bn.running_mean
running_var = branch.bn.running_var
gamma = branch.bn.weight
beta = branch.bn.bias
eps = branch.bn.eps
else:
assert isinstance(branch, nn.BatchNorm2d)
if not hasattr(self, 'id_tensor'):
input_dim = self.in_channels // self.groups
kernel_value = np.zeros((self.in_channels, input_dim, 3, 3), dtype=np.float32)
for i in range(self.in_channels):
kernel_value[i, i % input_dim, 1, 1] = 1
self.id_tensor = torch.from_numpy(kernel_value).to(branch.weight.device)
kernel = self.id_tensor
running_mean = branch.running_mean
running_var = branch.running_var
gamma = branch.weight
beta = branch.bias
eps = branch.eps
std = (running_var + eps).sqrt()
t = (gamma / std).reshape(-1, 1, 1, 1)
return kernel * t, beta - running_mean * gamma / std
def switch_to_deploy(self):
if hasattr(self, 'rbr_reparam'):
return
kernel, bias = self.get_equivalent_kernel_bias()
self.rbr_reparam = nn.Conv2d(in_channels=self.rbr_dense.conv.in_channels, out_channels=self.rbr_dense.conv.out_channels,
kernel_size=self.rbr_dense.conv.kernel_size, stride=self.rbr_dense.conv.stride,
padding=self.rbr_dense.conv.padding, dilation=self.rbr_dense.conv.dilation, groups=self.rbr_dense.conv.groups, bias=True)
self.rbr_reparam.weight.data = kernel
self.rbr_reparam.bias.data = bias
for para in self.parameters():
para.detach_()
self.__delattr__('rbr_dense')
self.__delattr__('rbr_1x1')
if hasattr(self, 'rbr_identity'):
self.__delattr__('rbr_identity')
if hasattr(self, 'id_tensor'):
self.__delattr__('id_tensor')
self.deploy = True
class BepC3(nn.Module):
'''Beer-mug RepC3 Block'''
def __init__(self, in_channels, out_channels, n=1,block=RepVGGBlock, e=0.5, concat=True): # ch_in, ch_out, number, shortcut, groups, expansion
super().__init__()
c_ = int(out_channels * e) # hidden channels
self.cv1 = Conv_C3(in_channels, c_, 1, 1)
self.cv2 = Conv_C3(in_channels, c_, 1, 1)
self.cv3 = Conv_C3(2 * c_, out_channels, 1, 1)
if block == ConvWrapper:
self.cv1 = Conv_C3(in_channels, c_, 1, 1, act=nn.SiLU())
self.cv2 = Conv_C3(in_channels, c_, 1, 1, act=nn.SiLU())
self.cv3 = Conv_C3(2 * c_, out_channels, 1, 1, act=nn.SiLU())
self.m = RepBlock(in_channels=c_, out_channels=c_, n=n, block=BottleRep, basic_block=block)
self.concat = concat
if not concat:
self.cv3 = Conv_C3(c_, out_channels, 1, 1)
def forward(self, x):
if self.concat is True:
return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), dim=1))
else:
return self.cv3(self.m(self.cv1(x)))
class Conv_C3(nn.Module):
'''Standard convolution in BepC3-Block'''
def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True): # ch_in, ch_out, kernel, stride, padding, groups
super().__init__()
self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False)
self.bn = nn.BatchNorm2d(c2)
self.act = nn.ReLU() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
def forward(self, x):
return self.act(self.bn(self.conv(x)))
def forward_fuse(self, x):
return self.act(self.conv(x))
class ConvWrapper(nn.Module):
'''Wrapper for normal Conv with SiLU activation'''
def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, groups=1, bias=True):
super().__init__()
self.block = Conv(in_channels, out_channels, kernel_size, stride, groups, bias)
def forward(self, x):
return self.block(x)
class Conv(nn.Module): ##如果是yolov5,需要改一改
'''Normal Conv with SiLU activation'''
def __init__(self, in_channels, out_channels, kernel_size, stride, groups=1, bias=False):
super().__init__()
padding = kernel_size // 2
self.conv = nn.Conv2d(
in_channels,
out_channels,
kernel_size=kernel_size,
stride=stride,
padding=padding,
groups=groups,
bias=bias,
)
self.bn = nn.BatchNorm2d(out_channels)
self.act = nn.SiLU()
def forward(self, x):
return self.act(self.bn(self.conv(x)))
def forward_fuse(self, x):
return self.act(self.conv(x))
class RepBlock(nn.Module):
'''
RepBlock is a stage block with rep-style basic block
'''
def __init__(self, in_channels, out_channels, n=1, block=RepVGGBlock, basic_block=RepVGGBlock):
super().__init__()
self.conv1 = block(in_channels, out_channels)
self.block = nn.Sequential(*(block(out_channels, out_channels) for _ in range(n - 1))) if n > 1 else None
if block == BottleRep:
self.conv1 = BottleRep(in_channels, out_channels, basic_block=basic_block, weight=True)
n = n // 2
self.block = nn.Sequential(*(BottleRep(out_channels, out_channels, basic_block=basic_block, weight=True) for _ in range(n - 1))) if n > 1 else None
def forward(self, x):
x = self.conv1(x)
if self.block is not None:
x = self.block(x)
return x
class BottleRep(nn.Module):
def __init__(self, in_channels, out_channels, basic_block=RepVGGBlock, weight=False):
super().__init__()
self.conv1 = basic_block(in_channels, out_channels)
self.conv2 = basic_block(out_channels, out_channels)
if in_channels != out_channels:
self.shortcut = False
else:
self.shortcut = True
if weight:
self.alpha = Parameter(torch.ones(1))
else:
self.alpha = 1.0
def forward(self, x):
outputs = self.conv1(x)
outputs = self.conv2(outputs)
return outputs + self.alpha * x if self.shortcut else outputs
def conv_bn(in_channels, out_channels, kernel_size, stride, padding, groups=1):
'''Basic cell for rep-style block, including conv and bn'''
result = nn.Sequential()
result.add_module('conv', nn.Conv2d(in_channels=in_channels, out_channels=out_channels,
kernel_size=kernel_size, stride=stride, padding=padding, groups=groups, bias=False))
result.add_module('bn', nn.BatchNorm2d(num_features=out_channels))
return result
def autopad(k, p=None): # kernel, padding
# Pad to 'same'
if p is None:
p = k // 2 if isinstance(k, int) else [x // 2 for x in k] # auto-pad
return p
然后在yolov6/models/yolo.py加入,或在yolov5的yolo.py加入
elif m in [ConvWrapper]:
c1 = ch[f]
c2 = args[0]
args = [c1, c2, *args[1:]]
elif m in [BepC3]:
c1, c2 = ch[f], args[0]
c2 = make_divisible(c2 * gw, 8)
args = [c1, c2, *args[1:]]
if m in [RepBlock]:
args.insert(2, n) # number of repeats
n = 1
至此主干网络搭建完毕,仿佛有些意犹未尽,下篇文章会进行YOLOv6 Rep-PAN的搭建!