yolov5 from entry to proficiency

Table of contents

introduction

network structure

data augmentation

deploy

Generate data for circle and rectangle detection

introduction

Since its release on May 18, 2020, several versions have been iterated. The latest version is v7, which adds segmentation capabilities. There have been many blog posts explaining the principle of yolov5 and how to use the marked data, such as the detailed explanation of YOLOv5 network  and the in-depth explanation of the core basic knowledge of Yolov5 in the Yolo series.   dazzle

Simple to install and easy to use, it has become the de facto benchmark for detection methods 

// 克隆代码库即可
git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

It only takes one line of code to complete when using

import torch
# 加载模型
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')  # or yolov5n - yolov5x6, custom
# 图片路径
img = 'https://ultralytics.com/images/zidane.jpg'  # or file, Path, PIL, OpenCV, numpy, list
# 执行检测推理
results = model(img)
# 检测结果可视化
results.print()  # or .show(), .save(), .crop(), .pandas(), etc.

What? You don’t want to write a single line of code. If you want to develop without code and see the effect directly from the camera, then running detect.py in the warehouse can also meet your requirements

python detect.py --weights yolov5s.pt --source 0

The meaning of the detailed parameters is as follows, where --weights specifies the pre-training weight you want to use, --source specifies the source to be detected (picture, picture path list, camera or even network push stream) 

python detect.py --weights yolov5s.pt --source 0                               # webcam
                                               img.jpg                         # image
                                               vid.mp4                         # video
                                               screen                          # screenshot
                                               path/                           # directory
                                               list.txt                        # list of images
                                               list.streams                    # list of streams
                                               'path/*.jpg'                    # glob
                                               'https://youtu.be/Zgi9g1ksQHc'  # YouTube
                                               'rtsp://example.com/media.mp4'  # RTSP, RTMP, HTTP stream
yolov5 v7.0 speed and accuracy comparison
The mAP and speed of the pre-trained model on coco

network structure

YOLOv5 has the same overall network architecture for different sizes ( nsml,  ), but it uses different depths and widths in each sub-module to deal with the parameters in the file respectively . It should also be noted that in addition to the official version, there are also , the difference is that the latter is for images with larger resolutions. For example of course there are some differences in structure. 4 prediction feature layers, while the former will only downsample to 32 times and use 3 prediction feature layers.xyamldepth_multiplewidth_multiplensmlxn6s6m6l6x61280x1280

Compared with the previous version, YOLOv5 v6.0has a small change after the version, replacing the first layer of the network (originally a Focusmodule) with a 6x6large and small convolutional layer. The two are equivalent in theory , but for some existing GPU devices (and corresponding optimization algorithms), it is more efficient to use 6x6large and small convolutional layers than to use Focusmodules. For details, please refer to this issue #4825. The figure below is the original Focusmodule (similar to the previous Swin Transformerone Patch Merging), 2x2divide each adjacent pixel into one patch, and then patchput together the same position (same color) pixels in each to get 4 feature map, and then connect A 3x3convolutional layer of the previous size. 6x6This is equivalent to using a convolutional layer of one size directly .

After YOLOv5 6.0, replace Focus with an equivalent 6x6 convolutional layer for easy deployment

The Neck part will SPPbe replaced with a SPPF( Glenn Jocherself-designed), the function of the two is the same, but the latter is more efficient. SPPThe structure is to pass the input through multiple different sizes in parallel MaxPool, and then do further fusion, which can solve the target multi-scale problem to a certain extent. The SPPFstructure is to serialize the input through layers 5x5of multiple sizes MaxPool. It should be noted here that the calculation result of serializing layers 5x5of two sizes is the same as that of layers of one size, and the calculation result of serializing layers of three sizes is the same as that of layers of one size. Layer calculation results are the same.MaxPool9x9MaxPool5x5MaxPool13x13MaxPool

SPPF network structure diagram

data augmentation

Mosaic , combine four pictures into one picture

Mosaic data augmentation

Copy paste , randomly paste some targets into the picture, the premise is that the data must have segmentsdata, that is, the instance segmentation information of each target

Copy paste, only used in split training

Random affine(Rotation, Scale, Translation and Shear) performs affine transformation randomly, but according to the hyperparameters in the configuration file, it is found that only sum Scaleand Translationtranslation are used.

Random Radial Transform Enhancement

MixUp is to blend two images together with a certain transparency. It is not clear whether it is useful or not. After all, there are no papers and no ablation experiments. Only the larger model is used in the code MixUp, and only 10% of the time.

mix up

Albumentations , mainly to do some filtering, histogram equalization, and change the image quality, etc. I see that the code written in the code will only be enabled when the package is installed , but the package is commented out albumentationsin the project requirements.txtfile , so it is not enabled by default. albumentationsenabled.

Augment HSV(Hue, Saturation, Value) randomly adjusts hue, saturation and lightness.

random color change

Random horizontal flip , random horizontal flip 

Random horizontal flip, this is more commonly used

Many training strategies are used in the YOLOv5 source code

  • Multi-scale training (0.5~1.5x) , multi-scale training, assuming that the size of the input image is set to 640 × 640, the size used during training is randomly selected between 0.5 × 640 ∼ 1.5 × 640, pay attention to the value obtained Both are integer multiples of 32 (because the network will downsample by a maximum of 32 times).
  • AutoAnchor(For training custom data) , when training your own data set, you can re-cluster to generate Anchors templates according to the goals in your own data set.
  • Warmup and Cosine LR scheduler , warm up before training Warmup, and then use Cosinethe learning rate drop strategy.
  • EMA (Exponential Moving Average) can be understood as adding a momentum to the training parameters to make the update process smoother.
  • Mixed precision , mixed precision training, can reduce video memory usage and speed up training, provided GPU hardware support.
  • Evolve hyper-parameters , hyperparameter optimization, people who have no experience in alchemy should not touch it, just keep the default.

The loss of YOLOv5 mainly consists of three parts:

  • Classes loss , classification loss, is used BCE loss, pay attention to only calculate the classification loss of positive samples.
  • Objectness loss , objthe loss, is still used BCE loss. Note that this objrefers to the target bounding box predicted by the network and the GT Box CIoU. What is calculated here is objthe loss of all samples.
  • Location loss , location loss, is used CIoU loss, pay attention to only calculate the location loss of positive samples.

deploy

Versions before yolov5 v6.0 (not included) use the Focus layer, which causes a lot of changes to the deployment and requires a lot of complicated operations. For details, see the detailed record u version YOLOv5 target detection ncnn implementation , the specific modification steps are as follows Detect YOLOv5 to ncnn mobile terminal deployment

// 1.导出onnx
python models/export.py --weights yolov5s.pt --img 320 --batch 1
// 2.简化模型
python -m onnxsim yolov5s.onnx yolov5s-sim.onnx
// 3. 模型转换到ncnn
./onnx2ncnn yolov5s-sim.onnx yolov5s.param yolov5s.bin
// 4. 编辑 yolov5s.param文件
第4行到13行删除(也就是Slice和Concat层),将第二行由172改成164(一共删除了10层,第二行的173更改为164,计算方法173-(10-1)=164)
增加自定义层
YoloV5Focus              focus                    1 1  images 159
其中159是刚才删除的Concat层的输出
// 5. 支持动态尺寸输入
将reshape中的960,240,60更改为-1,或者其他 0=后面的数
// 6. ncnnoptimize优化
./ncnnoptimize yolov5s.param yolov5s.bin yolov5s-opt.param yolov5s-opt.bin 1

After v6.0, it is much more convenient to use 6x6 convolution instead. You can directly use opencv's dnn module for deployment. For details, see Detecting objects with YOLOv5, OpenCV, Python and C++ , code yolov5-opencv-cpp-python

However, it should be noted that it can only cooperate with opencv4.5.5 and above. It mainly includes 6 steps

// 1.加载模型
net = cv2.dnn.readNet('yolov5s.onnx')
// 2.加载图片
def format_yolov5(source):

    # put the image in square big enough
    col, row, _ = source.shape
    _max = max(col, row)
    resized = np.zeros((_max, _max, 3), np.uint8)
    resized[0:col, 0:row] = source
    
    # resize to 640x640, normalize to [0,1[ and swap Red and Blue channels
    result = cv2.dnn.blobFromImage(resized, 1/255.0, (640, 640), swapRB=True)
    
    return result
// 3.执行推理
predictions = net.forward()
output = predictions[0]
// 4.展开结果
def unwrap_detection(input_image, output_data):
    class_ids = []
    confidences = []
    boxes = []

    rows = output_data.shape[0]

    image_width, image_height, _ = input_image.shape

    x_factor = image_width / 640
    y_factor =  image_height / 640

    for r in range(rows):
        row = output_data[r]
        confidence = row[4]
        if confidence >= 0.4:

            classes_scores = row[5:]
            _, _, _, max_indx = cv2.minMaxLoc(classes_scores)
            class_id = max_indx[1]
            if (classes_scores[class_id] > .25):

                confidences.append(confidence)

                class_ids.append(class_id)

                x, y, w, h = row[0].item(), row[1].item(), row[2].item(), row[3].item() 
                left = int((x - 0.5 * w) * x_factor)
                top = int((y - 0.5 * h) * y_factor)
                width = int(w * x_factor)
                height = int(h * y_factor)
                box = np.array([left, top, width, height])
                boxes.append(box)

// 5.非极大值抑制
indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.25, 0.45) 

result_class_ids = []
result_confidences = []
result_boxes = []

for i in indexes:
    result_confidences.append(confidences[i])
    result_class_ids.append(class_ids[i])
    result_boxes.append(boxes[I])
// 6.可视化结果输出

class_list = []
with open("classes.txt", "r") as f:
    class_list = [cname.strip() for cname in f.readlines()]

colors = [(255, 255, 0), (0, 255, 0), (0, 255, 255), (255, 0, 0)]

for i in range(len(result_class_ids)):

    box = result_boxes[i]
    class_id = result_class_ids[i]

    color = colors[class_id % len(colors)]

    conf  = result_confidences[i]

    cv2.rectangle(image, box, color, 2)
    cv2.rectangle(image, (box[0], box[1] - 20), (box[0] + box[2], box[1]), color, -1)
    cv2.putText(image, class_list[class_id], (box[0] + 5, box[1] - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0,0,0))

Generate data for circle and rectangle detection

Next, a project to detect circles and rectangles shows the training data generation and training process of yolov5. But labeling data is a time-consuming and labor-intensive task. Wouldn't it be great if the generated data could be used to quickly verify some experiments.

Based on Lable Studio [yoloV5 actual combat records] Xiaobai can also train his own data set! Based on labelimg, I will teach you how to use deep learning to do object detection (2): data labeling- Programmer

The labeling format of yolov5 is very simple. The pictures are placed in the images folder. Under the labels folder, each picture file has a corresponding txt file with the same name, which stores the category, normalized coordinates and width of each target by line. High, many annotation tools support direct export of yolo annotation format, and there are also many scripts that can easily convert from VOC, coco and other formats to YOLO format.

类别1 归一化中心点坐标x 归一化中心坐标y 归一化宽度 归一化高度
类别2 归一化中心点坐标x 归一化中心坐标y 归一化宽度 归一化高度
Annotation file example

1. Here we take the detection circle as an example to introduce each step in detail

The first is the generation and visualization of training data, randomly draw a circle with a random point as the center and a radius of 60-100 as the target we want to detect, and generate a total of 100,000 pieces of training data

import os
import cv2
import math
import random
import numpy as np
from tqdm import tqdm

def generate():
    img = np.zeros((640,640,3),np.uint8)
    x = 100+random.randint(0, 400)
    y = 100+random.randint(0, 400)
    radius = random.randint(60,100)
    r = random.randint(0,255)
    g = random.randint(0,255)
    b = random.randint(0,255)
    cv2.circle(img, (x,y), radius, (b,g,r),-1)
    return img, [x,y,radius]

def generate_batch(num=10000):
    images_dir = "data/circle/images"
    if not os.path.exists(images_dir):
        os.makedirs(images_dir)
    labels_dir = "data/circle/labels"
    if not os.path.exists(labels_dir):
            os.makedirs(labels_dir)
    for i in tqdm(range(num)):
        img, labels = generate()
        cv2.imwrite(images_dir+"/"+str(i)+".jpg", img)
        with open(labels_dir+"/"+str(i)+".txt", 'w') as f:
            x, y, radius = labels
            f.write("0 "+str(x/640)+" "+str(y/640)+" "+str(2*radius/640)+" "+str(2*radius/640)+"\n")

def show_gt(dir='data/circle'):
    files = os.listdir(dir+"/images")
    gtdir = dir+"/gt"
    if not os.path.exists(gtdir):
        os.makedirs(gtdir)
    for file in tqdm(files):
        imgpath = dir+"/images/"+file
        img = cv2.imread(imgpath)
        h,w,_ = img.shape
        labelpath = dir+"/labels/"+file[:-3]+"txt"
        with open(labelpath) as f:
            lines = f.readlines()
            for line in lines:
                items = line[:-1].split(" ")
                c = int(items[0])
                cx = float(items[1])
                cy = float(items[2])
                cw = float(items[3])
                ch = float(items[4])
                x1 = int((cx - cw/2)*w)
                y1 = int((cy - ch/2)*h)
                x2 = int((cx + cw/2)*w)
                y2 = int((cy + ch/2)*h)
                cv2.rectangle(img, (x1,y1),(x2,y2),(0,255,0),2)
            cv2.imwrite(gtdir+"/"+file, img)

if __name__=="__main__":
    generate_batch()
    show_gt()

Then construct circle.yaml

train: data/circle/images/
val: data/circle/images/
# number of classes
nc: 1

# class names
names: ['circle']

2. If you want to detect circular and rectangular targets, you need to adjust the generation script and data configuration file

import os
import cv2
import math
import random
import numpy as np
from tqdm import tqdm

def generate_circle():
    img = np.zeros((640,640,3),np.uint8)
    x = 100+random.randint(0, 400)
    y = 100+random.randint(0, 400)
    radius = random.randint(60,100)
    r = random.randint(0,255)
    g = random.randint(0,255)
    b = random.randint(0,255)
    cv2.circle(img, (x,y), radius, (b,g,r),-1)
    return img, [x,y,radius*2,radius*2]

def generate_rectangle():
    img = np.zeros((640,640,3),np.uint8)
    x1 = 100+random.randint(0, 400)
    y1 = 100+random.randint(0, 400)
    w = random.randint(80, 200)
    h = random.randint(80, 200)
    x2 = x1 + w
    y2 = y1 + h
    r = random.randint(0,255)
    g = random.randint(0,255)
    b = random.randint(0,255)
    cx = (x1+x2)//2
    cy = (y1+y2)//2
    cv2.rectangle(img, (x1,y1), (x2,y2), (b,g,r),-1)
    return img, [cx,cy,w,h]

def generate_batch(num=100000):
    images_dir = "data/shape/images"
    if not os.path.exists(images_dir):
        os.makedirs(images_dir)
    labels_dir = "data/shape/labels"
    if not os.path.exists(labels_dir):
            os.makedirs(labels_dir)
    for i in tqdm(range(num)):
        if i % 2 == 0:
            img, labels = generate_circle()
        else:
            img, labels = generate_rectangle()
        cv2.imwrite(images_dir+"/"+str(i)+".jpg", img)
        with open("data/shape/labels/"+str(i)+".txt", 'w') as f:
            cx,cy,w,h = labels
            f.write(str(i%2)+" "+str(cx/640)+" "+str(cy/640)+" "+str(w/640)+" "+str(h/640)+"\n")

def show_gt(dir='data/shape'):
    files = os.listdir(dir+"/images")
    gtdir = dir+"/gt"
    if not os.path.exists(gtdir):
        os.makedirs(gtdir)
    for file in tqdm(files):
        imgpath = dir+"/images/"+file
        img = cv2.imread(imgpath)
        h, w, _ = img.shape
        labelpath = dir+"/labels/"+file[:-3]+"txt"
        with open(labelpath) as f:
            lines = f.readlines()
            for line in lines:
                items = line[:-1].split(" ")
                c = int(items[0])
                cx = float(items[1])
                cy = float(items[2])
                cw = float(items[3])
                ch = float(items[4])
                x1 = int((cx - cw/2)*w)
                y1 = int((cy - ch/2)*h)
                x2 = int((cx + cw/2)*w)
                y2 = int((cy + ch/2)*h)
                cv2.rectangle(img, (x1,y1),(x2,y2),(0,255,0),2)
                cv2.putText(img, str(c), (x1,y1), 3,1,(0,0,255))
            cv2.imwrite(gtdir+"/"+file, img)

if __name__=="__main__":
    generate_batch()
    show_gt()

 Corresponding shape.yaml, note that the number of categories is 2

train: data/shape/images/
val: data/shape/images/
# number of classes
nc: 2

# class names
names: ['circle', 'rectangle']

train

Start training with the following command

python train.py --data circle.yaml --cfg yolov5s.yaml --weights '' --batch-size 64

If there are two types of targets, circular and rectangular, the command is

python train.py --data shape.yaml --cfg yolov5s.yaml --weights '' --batch-size 64

​​​​​​​  

Look at the statistics, categories and distributions printed during training

 

Train a few epochs to see the results

               epoch,      train/box_loss,      train/obj_loss,      train/cls_loss,   metrics/precision,      metrics/recall,     metrics/mAP_0.5,metrics/mAP_0.5:0.95,        val/box_loss,        val/obj_loss,        val/cls_loss,               x/lr0,               x/lr1,               x/lr2
                   0,             0.03892,            0.011817,                   0,             0.99998,             0.99978,               0.995,             0.92987,           0.0077891,           0.0030948,                   0,           0.0033312,           0.0033312,            0.070019
                   1,            0.017302,           0.0049876,                   0,                   1,              0.9999,               0.995,             0.99105,           0.0031843,           0.0015662,                   0,           0.0066644,           0.0066644,            0.040019
                   2,            0.011272,           0.0034826,                   0,                   1,             0.99994,               0.995,             0.99499,           0.0020194,           0.0010969,                   0,           0.0099969,           0.0099969,            0.010018
                   3,           0.0080153,           0.0027186,                   0,                   1,             0.99994,               0.995,               0.995,           0.0013095,          0.00083033,                   0,           0.0099978,           0.0099978,           0.0099978
                   4,           0.0067639,           0.0023831,                   0,                   1,             0.99996,               0.995,               0.995,          0.00099513,          0.00068878,                   0,           0.0099978,           0.0099978,           0.0099978
                   5,           0.0061637,           0.0022279,                   0,                   1,             0.99996,               0.995,               0.995,          0.00090497,          0.00064193,                   0,           0.0099961,           0.0099961,           0.0099961
                   6,           0.0058844,            0.002144,                   0,             0.99999,             0.99998,               0.995,               0.995,           0.0009117,          0.00063328,                   0,           0.0099938,           0.0099938,           0.0099938
                   7,           0.0056247,             0.00208,                   0,             0.99999,             0.99999,               0.995,               0.995,          0.00086355,          0.00061343,                   0,           0.0099911,           0.0099911,           0.0099911
                   8,           0.0054567,           0.0020223,                   0,                   1,             0.99999,               0.995,               0.995,          0.00081632,          0.00059592,                   0,           0.0099879,           0.0099879,           0.0099879
                   9,           0.0053597,           0.0019864,                   0,                   1,                   1,               0.995,               0.995,          0.00081379,          0.00058942,                   0,           0.0099842,           0.0099842,           0.0099842
                  10,           0.0053103,           0.0019559,                   0,                   1,                   1,               0.995,               0.995,           0.0008175,          0.00058669,                   0,             0.00998,             0.00998,             0.00998
                  11,           0.0052146,           0.0019445,                   0,                   1,                   1,               0.995,               0.995,          0.00083248,          0.00058731,                   0,           0.0099753,           0.0099753,           0.0099753
                  12,           0.0050852,           0.0019065,                   0,                   1,                   1,               0.995,               0.995,          0.00085092,          0.00058853,                   0,           0.0099702,           0.0099702,           0.0099702
                  13,           0.0050589,           0.0019031,                   0,                   1,                   1,               0.995,               0.995,          0.00086915,          0.00059267,                   0,           0.0099645,           0.0099645,           0.0099645
                  14,           0.0049664,           0.0018693,                   0,                   1,                   1,               0.995,               0.995,          0.00090856,          0.00059815,                   0,           0.0099584,           0.0099584,           0.0099584
                  15,           0.0049839,           0.0018568,                   0,                   1,                   1,               0.995,               0.995,          0.00093147,          0.00060425,                   0,           0.0099517,           0.0099517,           0.0099517
                  16,           0.0049079,           0.0018459,                   0,                   1,                   1,               0.995,               0.995,           0.0009656,          0.00061124,                   0,           0.0099446,           0.0099446,           0.0099446
                  17,           0.0048693,           0.0018277,                   0,                   1,                   1,               0.995,               0.995,          0.00099703,          0.00061948,                   0,            0.009937,            0.009937,            0.009937
                  18,           0.0048052,           0.0018103,                   0,                   1,                   1,               0.995,               0.995,           0.0010246,          0.00062618,                   0,           0.0099289,           0.0099289,           0.0099289
                  19,           0.0047608,           0.0017947,                   0,                   1,                   1,               0.995,               0.995,           0.0010439,          0.00063123,                   0,           0.0099203,           0.0099203,           0.0099203

The mAP reached 99.5+, which is really good, look at the prediction results

  

PR curve

round and rectangular objects

  

deploy

Finally, use the following command to detect, remember to replace the path with the local path

python detect.py --weights exps/yolov5s_circle/weights/best.pt --source data/circle/images

 

The built-in demo is too long-winded to be compatible with various formats, and the onnx deployment code is much simpler

import cv2
import numpy as np
import torch
from torchvision import transforms
import onnxruntime
from utils.general import non_max_suppression

def detect(img, ort_session):
    img = img.astype(np.float32)
    img = img / 255
    img_tensor = img.transpose(2,0,1)[None]
    ort_inputs = {ort_session.get_inputs()[0].name: img_tensor}
    pred = torch.tensor(ort_session.run(None, ort_inputs)[0])
    dets = non_max_suppression(pred, 0.25, 0.45)
    return dets[0]

def demo():
    ort_session = onnxruntime.InferenceSession("yolov5s.onnx",  providers=['TensorrtExecutionProvider'])
    img = cv2.imread("data/images/bus.jpg")
    img = cv2.resize(img,(640,640))
    dets = detect(img, ort_session)
    for det in dets:
        x1 = int(det[0])
        y1 = int(det[1])
        x2 = int(det[2])
        y2 = int(det[3])
        score = float(det[4])
        cls = int(det[5])
        info = "{}_{:.2f}".format(cls, score*100)
        cv2.rectangle(img, (x1,y1),(x2,y2),(255,255,0))
        cv2.putText(img, info, (x1,y1), 1, 1, (0,0,255))
    cv2.imwrite("runs/detect/bus.jpg", img)
if __name__=="__main__":
    demo()

Summarize

This article explains in detail how to generate the labels required for training data through two examples of circle detection and rectangle detection, and gives the code implementation of the whole process of training, testing and deployment

Guess you like

Origin blog.csdn.net/minstyrain/article/details/123486914