Table of contents
Generate data for circle and rectangle detection
introduction
Since its release on May 18, 2020, several versions have been iterated. The latest version is v7, which adds segmentation capabilities. There have been many blog posts explaining the principle of yolov5 and how to use the marked data, such as the detailed explanation of YOLOv5 network and the in-depth explanation of the core basic knowledge of Yolov5 in the Yolo series. dazzle
Simple to install and easy to use, it has become the de facto benchmark for detection methods
// 克隆代码库即可
git clone https://github.com/ultralytics/yolov5 # clone
cd yolov5
pip install -r requirements.txt # install
It only takes one line of code to complete when using
import torch
# 加载模型
model = torch.hub.load('ultralytics/yolov5', 'yolov5s') # or yolov5n - yolov5x6, custom
# 图片路径
img = 'https://ultralytics.com/images/zidane.jpg' # or file, Path, PIL, OpenCV, numpy, list
# 执行检测推理
results = model(img)
# 检测结果可视化
results.print() # or .show(), .save(), .crop(), .pandas(), etc.
What? You don’t want to write a single line of code. If you want to develop without code and see the effect directly from the camera, then running detect.py in the warehouse can also meet your requirements
python detect.py --weights yolov5s.pt --source 0
The meaning of the detailed parameters is as follows, where --weights specifies the pre-training weight you want to use, --source specifies the source to be detected (picture, picture path list, camera or even network push stream)
python detect.py --weights yolov5s.pt --source 0 # webcam
img.jpg # image
vid.mp4 # video
screen # screenshot
path/ # directory
list.txt # list of images
list.streams # list of streams
'path/*.jpg' # glob
'https://youtu.be/Zgi9g1ksQHc' # YouTube
'rtsp://example.com/media.mp4' # RTSP, RTMP, HTTP stream
![](https://img-blog.csdnimg.cn/361140cd3ec74c64a8ed14cc5c64c24b.png)
![](https://img-blog.csdnimg.cn/89673fd9064448cfbef5ce1784976065.png)
network structure
YOLOv5 has the same overall network architecture for different sizes ( n
, s
, m
, l
, ), but it uses different depths and widths in each sub-module to deal with the parameters in the file respectively . It should also be noted that in addition to the official , , , , version, there are also , , , , the difference is that the latter is for images with larger resolutions. For example , of course there are some differences in structure. 4 prediction feature layers, while the former will only downsample to 32 times and use 3 prediction feature layers.x
yaml
depth_multiple
width_multiple
n
s
m
l
x
n6
s6
m6
l6
x6
1280x1280
Compared with the previous version, YOLOv5
v6.0
has a small change after the version, replacing the first layer of the network (originally aFocus
module) with a6x6
large and small convolutional layer. The two are equivalent in theory , but for some existing GPU devices (and corresponding optimization algorithms), it is more efficient to use6x6
large and small convolutional layers than to useFocus
modules. For details, please refer to this issue #4825. The figure below is the originalFocus
module (similar to the previousSwin Transformer
onePatch Merging
),2x2
divide each adjacent pixel into onepatch
, and thenpatch
put together the same position (same color) pixels in each to get 4feature map
, and then connect A3x3
convolutional layer of the previous size.6x6
This is equivalent to using a convolutional layer of one size directly .
![](https://img-blog.csdnimg.cn/img_convert/91b29c67ca7b44ff38820f1a16dc5fc1.png)
The Neck part will
SPP
be replaced with aSPPF
(Glenn Jocher
self-designed), the function of the two is the same, but the latter is more efficient.SPP
The structure is to pass the input through multiple different sizes in parallelMaxPool
, and then do further fusion, which can solve the target multi-scale problem to a certain extent. TheSPPF
structure is to serialize the input through layers5x5
of multiple sizesMaxPool
. It should be noted here that the calculation result of serializing layers5x5
of two sizes is the same as that of layers of one size, and the calculation result of serializing layers of three sizes is the same as that of layers of one size. Layer calculation results are the same.MaxPool
9x9
MaxPool
5x5
MaxPool
13x13
MaxPool
![](https://img-blog.csdnimg.cn/img_convert/72ac0c3519e87c45122729cf5827aa14.png)
data augmentation
Mosaic , combine four pictures into one picture
![](https://img-blog.csdnimg.cn/img_convert/a43656977523a982c12f3ce6260e5c1d.png)
Copy paste , randomly paste some targets into the picture, the premise is that the data must have segments
data, that is, the instance segmentation information of each target
![](https://img-blog.csdnimg.cn/img_convert/0fe0dbae5bb048a51b238ad39e2b386b.png)
Random affine(Rotation, Scale, Translation and Shear) performs affine transformation randomly, but according to the hyperparameters in the configuration file, it is found that only sum Scale
and Translation
translation are used.
![](https://img-blog.csdnimg.cn/img_convert/806e60dd14c2e1e4a3ecf917f36a8e32.png)
MixUp is to blend two images together with a certain transparency. It is not clear whether it is useful or not. After all, there are no papers and no ablation experiments. Only the larger model is used in the code MixUp
, and only 10% of the time.
![](https://img-blog.csdnimg.cn/img_convert/de0f59c7b7870915fed57f0b2237b0d5.png)
Albumentations , mainly to do some filtering, histogram equalization, and change the image quality, etc. I see that the code written in the code will only be enabled when the package is installed , but the package is commented out albumentations
in the project requirements.txt
file , so it is not enabled by default. albumentations
enabled.
Augment HSV(Hue, Saturation, Value) randomly adjusts hue, saturation and lightness.
![](https://img-blog.csdnimg.cn/img_convert/3f29b2dd22d8fbab29d6860d759b295d.png)
Random horizontal flip , random horizontal flip
![](https://img-blog.csdnimg.cn/img_convert/db63456a1658deddde9282503f6e647e.png)
Many training strategies are used in the YOLOv5 source code
- Multi-scale training (0.5~1.5x) , multi-scale training, assuming that the size of the input image is set to 640 × 640, the size used during training is randomly selected between 0.5 × 640 ∼ 1.5 × 640, pay attention to the value obtained Both are integer multiples of 32 (because the network will downsample by a maximum of 32 times).
- AutoAnchor(For training custom data) , when training your own data set, you can re-cluster to generate Anchors templates according to the goals in your own data set.
- Warmup and Cosine LR scheduler , warm up before training
Warmup
, and then useCosine
the learning rate drop strategy.- EMA (Exponential Moving Average) can be understood as adding a momentum to the training parameters to make the update process smoother.
- Mixed precision , mixed precision training, can reduce video memory usage and speed up training, provided GPU hardware support.
- Evolve hyper-parameters , hyperparameter optimization, people who have no experience in alchemy should not touch it, just keep the default.
The loss of YOLOv5 mainly consists of three parts:
- Classes loss , classification loss, is used
BCE loss
, pay attention to only calculate the classification loss of positive samples. - Objectness loss ,
obj
the loss, is still usedBCE loss
. Note that thisobj
refers to the target bounding box predicted by the network and the GT BoxCIoU
. What is calculated here isobj
the loss of all samples. - Location loss , location loss, is used
CIoU loss
, pay attention to only calculate the location loss of positive samples.
deploy
Versions before yolov5 v6.0 (not included) use the Focus layer, which causes a lot of changes to the deployment and requires a lot of complicated operations. For details, see the detailed record u version YOLOv5 target detection ncnn implementation , the specific modification steps are as follows Detect YOLOv5 to ncnn mobile terminal deployment
// 1.导出onnx
python models/export.py --weights yolov5s.pt --img 320 --batch 1
// 2.简化模型
python -m onnxsim yolov5s.onnx yolov5s-sim.onnx
// 3. 模型转换到ncnn
./onnx2ncnn yolov5s-sim.onnx yolov5s.param yolov5s.bin
// 4. 编辑 yolov5s.param文件
第4行到13行删除(也就是Slice和Concat层),将第二行由172改成164(一共删除了10层,第二行的173更改为164,计算方法173-(10-1)=164)
增加自定义层
YoloV5Focus focus 1 1 images 159
其中159是刚才删除的Concat层的输出
// 5. 支持动态尺寸输入
将reshape中的960,240,60更改为-1,或者其他 0=后面的数
// 6. ncnnoptimize优化
./ncnnoptimize yolov5s.param yolov5s.bin yolov5s-opt.param yolov5s-opt.bin 1
After v6.0, it is much more convenient to use 6x6 convolution instead. You can directly use opencv's dnn module for deployment. For details, see Detecting objects with YOLOv5, OpenCV, Python and C++ , code yolov5-opencv-cpp-python
However, it should be noted that it can only cooperate with opencv4.5.5 and above. It mainly includes 6 steps
// 1.加载模型
net = cv2.dnn.readNet('yolov5s.onnx')
// 2.加载图片
def format_yolov5(source):
# put the image in square big enough
col, row, _ = source.shape
_max = max(col, row)
resized = np.zeros((_max, _max, 3), np.uint8)
resized[0:col, 0:row] = source
# resize to 640x640, normalize to [0,1[ and swap Red and Blue channels
result = cv2.dnn.blobFromImage(resized, 1/255.0, (640, 640), swapRB=True)
return result
// 3.执行推理
predictions = net.forward()
output = predictions[0]
// 4.展开结果
def unwrap_detection(input_image, output_data):
class_ids = []
confidences = []
boxes = []
rows = output_data.shape[0]
image_width, image_height, _ = input_image.shape
x_factor = image_width / 640
y_factor = image_height / 640
for r in range(rows):
row = output_data[r]
confidence = row[4]
if confidence >= 0.4:
classes_scores = row[5:]
_, _, _, max_indx = cv2.minMaxLoc(classes_scores)
class_id = max_indx[1]
if (classes_scores[class_id] > .25):
confidences.append(confidence)
class_ids.append(class_id)
x, y, w, h = row[0].item(), row[1].item(), row[2].item(), row[3].item()
left = int((x - 0.5 * w) * x_factor)
top = int((y - 0.5 * h) * y_factor)
width = int(w * x_factor)
height = int(h * y_factor)
box = np.array([left, top, width, height])
boxes.append(box)
// 5.非极大值抑制
indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.25, 0.45)
result_class_ids = []
result_confidences = []
result_boxes = []
for i in indexes:
result_confidences.append(confidences[i])
result_class_ids.append(class_ids[i])
result_boxes.append(boxes[I])
// 6.可视化结果输出
class_list = []
with open("classes.txt", "r") as f:
class_list = [cname.strip() for cname in f.readlines()]
colors = [(255, 255, 0), (0, 255, 0), (0, 255, 255), (255, 0, 0)]
for i in range(len(result_class_ids)):
box = result_boxes[i]
class_id = result_class_ids[i]
color = colors[class_id % len(colors)]
conf = result_confidences[i]
cv2.rectangle(image, box, color, 2)
cv2.rectangle(image, (box[0], box[1] - 20), (box[0] + box[2], box[1]), color, -1)
cv2.putText(image, class_list[class_id], (box[0] + 5, box[1] - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0,0,0))
Generate data for circle and rectangle detection
Next, a project to detect circles and rectangles shows the training data generation and training process of yolov5. But labeling data is a time-consuming and labor-intensive task. Wouldn't it be great if the generated data could be used to quickly verify some experiments.
Based on Lable Studio [yoloV5 actual combat records] Xiaobai can also train his own data set! Based on labelimg, I will teach you how to use deep learning to do object detection (2): data labeling- Programmer
The labeling format of yolov5 is very simple. The pictures are placed in the images folder. Under the labels folder, each picture file has a corresponding txt file with the same name, which stores the category, normalized coordinates and width of each target by line. High, many annotation tools support direct export of yolo annotation format, and there are also many scripts that can easily convert from VOC, coco and other formats to YOLO format.
类别1 归一化中心点坐标x 归一化中心坐标y 归一化宽度 归一化高度
类别2 归一化中心点坐标x 归一化中心坐标y 归一化宽度 归一化高度
![](https://img-blog.csdnimg.cn/3a24c1d4eb294950a69bfbf6eafe9fc6.png)
1. Here we take the detection circle as an example to introduce each step in detail
The first is the generation and visualization of training data, randomly draw a circle with a random point as the center and a radius of 60-100 as the target we want to detect, and generate a total of 100,000 pieces of training data
import os
import cv2
import math
import random
import numpy as np
from tqdm import tqdm
def generate():
img = np.zeros((640,640,3),np.uint8)
x = 100+random.randint(0, 400)
y = 100+random.randint(0, 400)
radius = random.randint(60,100)
r = random.randint(0,255)
g = random.randint(0,255)
b = random.randint(0,255)
cv2.circle(img, (x,y), radius, (b,g,r),-1)
return img, [x,y,radius]
def generate_batch(num=10000):
images_dir = "data/circle/images"
if not os.path.exists(images_dir):
os.makedirs(images_dir)
labels_dir = "data/circle/labels"
if not os.path.exists(labels_dir):
os.makedirs(labels_dir)
for i in tqdm(range(num)):
img, labels = generate()
cv2.imwrite(images_dir+"/"+str(i)+".jpg", img)
with open(labels_dir+"/"+str(i)+".txt", 'w') as f:
x, y, radius = labels
f.write("0 "+str(x/640)+" "+str(y/640)+" "+str(2*radius/640)+" "+str(2*radius/640)+"\n")
def show_gt(dir='data/circle'):
files = os.listdir(dir+"/images")
gtdir = dir+"/gt"
if not os.path.exists(gtdir):
os.makedirs(gtdir)
for file in tqdm(files):
imgpath = dir+"/images/"+file
img = cv2.imread(imgpath)
h,w,_ = img.shape
labelpath = dir+"/labels/"+file[:-3]+"txt"
with open(labelpath) as f:
lines = f.readlines()
for line in lines:
items = line[:-1].split(" ")
c = int(items[0])
cx = float(items[1])
cy = float(items[2])
cw = float(items[3])
ch = float(items[4])
x1 = int((cx - cw/2)*w)
y1 = int((cy - ch/2)*h)
x2 = int((cx + cw/2)*w)
y2 = int((cy + ch/2)*h)
cv2.rectangle(img, (x1,y1),(x2,y2),(0,255,0),2)
cv2.imwrite(gtdir+"/"+file, img)
if __name__=="__main__":
generate_batch()
show_gt()
Then construct circle.yaml
train: data/circle/images/
val: data/circle/images/
# number of classes
nc: 1
# class names
names: ['circle']
2. If you want to detect circular and rectangular targets, you need to adjust the generation script and data configuration file
import os
import cv2
import math
import random
import numpy as np
from tqdm import tqdm
def generate_circle():
img = np.zeros((640,640,3),np.uint8)
x = 100+random.randint(0, 400)
y = 100+random.randint(0, 400)
radius = random.randint(60,100)
r = random.randint(0,255)
g = random.randint(0,255)
b = random.randint(0,255)
cv2.circle(img, (x,y), radius, (b,g,r),-1)
return img, [x,y,radius*2,radius*2]
def generate_rectangle():
img = np.zeros((640,640,3),np.uint8)
x1 = 100+random.randint(0, 400)
y1 = 100+random.randint(0, 400)
w = random.randint(80, 200)
h = random.randint(80, 200)
x2 = x1 + w
y2 = y1 + h
r = random.randint(0,255)
g = random.randint(0,255)
b = random.randint(0,255)
cx = (x1+x2)//2
cy = (y1+y2)//2
cv2.rectangle(img, (x1,y1), (x2,y2), (b,g,r),-1)
return img, [cx,cy,w,h]
def generate_batch(num=100000):
images_dir = "data/shape/images"
if not os.path.exists(images_dir):
os.makedirs(images_dir)
labels_dir = "data/shape/labels"
if not os.path.exists(labels_dir):
os.makedirs(labels_dir)
for i in tqdm(range(num)):
if i % 2 == 0:
img, labels = generate_circle()
else:
img, labels = generate_rectangle()
cv2.imwrite(images_dir+"/"+str(i)+".jpg", img)
with open("data/shape/labels/"+str(i)+".txt", 'w') as f:
cx,cy,w,h = labels
f.write(str(i%2)+" "+str(cx/640)+" "+str(cy/640)+" "+str(w/640)+" "+str(h/640)+"\n")
def show_gt(dir='data/shape'):
files = os.listdir(dir+"/images")
gtdir = dir+"/gt"
if not os.path.exists(gtdir):
os.makedirs(gtdir)
for file in tqdm(files):
imgpath = dir+"/images/"+file
img = cv2.imread(imgpath)
h, w, _ = img.shape
labelpath = dir+"/labels/"+file[:-3]+"txt"
with open(labelpath) as f:
lines = f.readlines()
for line in lines:
items = line[:-1].split(" ")
c = int(items[0])
cx = float(items[1])
cy = float(items[2])
cw = float(items[3])
ch = float(items[4])
x1 = int((cx - cw/2)*w)
y1 = int((cy - ch/2)*h)
x2 = int((cx + cw/2)*w)
y2 = int((cy + ch/2)*h)
cv2.rectangle(img, (x1,y1),(x2,y2),(0,255,0),2)
cv2.putText(img, str(c), (x1,y1), 3,1,(0,0,255))
cv2.imwrite(gtdir+"/"+file, img)
if __name__=="__main__":
generate_batch()
show_gt()
Corresponding shape.yaml, note that the number of categories is 2
train: data/shape/images/
val: data/shape/images/
# number of classes
nc: 2
# class names
names: ['circle', 'rectangle']
train
Start training with the following command
python train.py --data circle.yaml --cfg yolov5s.yaml --weights '' --batch-size 64
If there are two types of targets, circular and rectangular, the command is
python train.py --data shape.yaml --cfg yolov5s.yaml --weights '' --batch-size 64
Look at the statistics, categories and distributions printed during training
Train a few epochs to see the results
epoch, train/box_loss, train/obj_loss, train/cls_loss, metrics/precision, metrics/recall, metrics/mAP_0.5,metrics/mAP_0.5:0.95, val/box_loss, val/obj_loss, val/cls_loss, x/lr0, x/lr1, x/lr2
0, 0.03892, 0.011817, 0, 0.99998, 0.99978, 0.995, 0.92987, 0.0077891, 0.0030948, 0, 0.0033312, 0.0033312, 0.070019
1, 0.017302, 0.0049876, 0, 1, 0.9999, 0.995, 0.99105, 0.0031843, 0.0015662, 0, 0.0066644, 0.0066644, 0.040019
2, 0.011272, 0.0034826, 0, 1, 0.99994, 0.995, 0.99499, 0.0020194, 0.0010969, 0, 0.0099969, 0.0099969, 0.010018
3, 0.0080153, 0.0027186, 0, 1, 0.99994, 0.995, 0.995, 0.0013095, 0.00083033, 0, 0.0099978, 0.0099978, 0.0099978
4, 0.0067639, 0.0023831, 0, 1, 0.99996, 0.995, 0.995, 0.00099513, 0.00068878, 0, 0.0099978, 0.0099978, 0.0099978
5, 0.0061637, 0.0022279, 0, 1, 0.99996, 0.995, 0.995, 0.00090497, 0.00064193, 0, 0.0099961, 0.0099961, 0.0099961
6, 0.0058844, 0.002144, 0, 0.99999, 0.99998, 0.995, 0.995, 0.0009117, 0.00063328, 0, 0.0099938, 0.0099938, 0.0099938
7, 0.0056247, 0.00208, 0, 0.99999, 0.99999, 0.995, 0.995, 0.00086355, 0.00061343, 0, 0.0099911, 0.0099911, 0.0099911
8, 0.0054567, 0.0020223, 0, 1, 0.99999, 0.995, 0.995, 0.00081632, 0.00059592, 0, 0.0099879, 0.0099879, 0.0099879
9, 0.0053597, 0.0019864, 0, 1, 1, 0.995, 0.995, 0.00081379, 0.00058942, 0, 0.0099842, 0.0099842, 0.0099842
10, 0.0053103, 0.0019559, 0, 1, 1, 0.995, 0.995, 0.0008175, 0.00058669, 0, 0.00998, 0.00998, 0.00998
11, 0.0052146, 0.0019445, 0, 1, 1, 0.995, 0.995, 0.00083248, 0.00058731, 0, 0.0099753, 0.0099753, 0.0099753
12, 0.0050852, 0.0019065, 0, 1, 1, 0.995, 0.995, 0.00085092, 0.00058853, 0, 0.0099702, 0.0099702, 0.0099702
13, 0.0050589, 0.0019031, 0, 1, 1, 0.995, 0.995, 0.00086915, 0.00059267, 0, 0.0099645, 0.0099645, 0.0099645
14, 0.0049664, 0.0018693, 0, 1, 1, 0.995, 0.995, 0.00090856, 0.00059815, 0, 0.0099584, 0.0099584, 0.0099584
15, 0.0049839, 0.0018568, 0, 1, 1, 0.995, 0.995, 0.00093147, 0.00060425, 0, 0.0099517, 0.0099517, 0.0099517
16, 0.0049079, 0.0018459, 0, 1, 1, 0.995, 0.995, 0.0009656, 0.00061124, 0, 0.0099446, 0.0099446, 0.0099446
17, 0.0048693, 0.0018277, 0, 1, 1, 0.995, 0.995, 0.00099703, 0.00061948, 0, 0.009937, 0.009937, 0.009937
18, 0.0048052, 0.0018103, 0, 1, 1, 0.995, 0.995, 0.0010246, 0.00062618, 0, 0.0099289, 0.0099289, 0.0099289
19, 0.0047608, 0.0017947, 0, 1, 1, 0.995, 0.995, 0.0010439, 0.00063123, 0, 0.0099203, 0.0099203, 0.0099203
The mAP reached 99.5+, which is really good, look at the prediction results
![](https://img-blog.csdnimg.cn/f88b839705a84914b2c9539ad950d4f7.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBA6L-36Iul54Of6Zuo,size_20,color_FFFFFF,t_70,g_se,x_16)
round and rectangular objects
deploy
Finally, use the following command to detect, remember to replace the path with the local path
python detect.py --weights exps/yolov5s_circle/weights/best.pt --source data/circle/images
The built-in demo is too long-winded to be compatible with various formats, and the onnx deployment code is much simpler
import cv2
import numpy as np
import torch
from torchvision import transforms
import onnxruntime
from utils.general import non_max_suppression
def detect(img, ort_session):
img = img.astype(np.float32)
img = img / 255
img_tensor = img.transpose(2,0,1)[None]
ort_inputs = {ort_session.get_inputs()[0].name: img_tensor}
pred = torch.tensor(ort_session.run(None, ort_inputs)[0])
dets = non_max_suppression(pred, 0.25, 0.45)
return dets[0]
def demo():
ort_session = onnxruntime.InferenceSession("yolov5s.onnx", providers=['TensorrtExecutionProvider'])
img = cv2.imread("data/images/bus.jpg")
img = cv2.resize(img,(640,640))
dets = detect(img, ort_session)
for det in dets:
x1 = int(det[0])
y1 = int(det[1])
x2 = int(det[2])
y2 = int(det[3])
score = float(det[4])
cls = int(det[5])
info = "{}_{:.2f}".format(cls, score*100)
cv2.rectangle(img, (x1,y1),(x2,y2),(255,255,0))
cv2.putText(img, info, (x1,y1), 1, 1, (0,0,255))
cv2.imwrite("runs/detect/bus.jpg", img)
if __name__=="__main__":
demo()
Summarize
This article explains in detail how to generate the labels required for training data through two examples of circle detection and rectangle detection, and gives the code implementation of the whole process of training, testing and deployment