OpenCV practice - using YOLO for target detection

0. Preface

In this section, we will perform object detection using the YOLO algorithm. Object detection is a common task in computer vision, and with the help of deep learning technology, we can achieve high-accuracy detection. YOLO in COCO dataset (dataset contains 80 categories and more than 300000 images) can achieve 60.6mAP (20 fps) or 33mAP (220 fps).

1. Introduction to YOLO model

YOLO is an important branch of deep learning network target detection, which divides the input image into SxS grids. For each grid, YOLO is checked for B bounding boxes, and then the deep learning model extracts each grid's bounding box, the confidence of containing possible objects, and the Confidence for each category in bounding boxes (in the training data set):

YOLO grid
YOLO uses 19x19 grids, each containing 5 bounding boxes, and the training dataset contains 80 categories. The output result of the network is 19x19x425, where 425 comes from the bounding box (x,y,width,height), whether the bounding box contains the confidence of the object, and whether the object belongs to Confidence for each category (total80 categories):

5_bounding box*(x,y,w,h,object_confidence,classify_confidence[80])=5*(4 + 1 + 80)

YOLO architecture based on DarkNet (including 53 layer network), YOLO on DarkNet On the basis of 53 layer network, a total of 106 layer network is added. If we need an architecture with faster prediction speed, we can use an architecture with fewer network layers TinyYOLO .

2. Implement target detection based on YOLO

In this section, we use the same functions and classes as in the sectionIntroduction to Deep Learning to load the model, preprocess images, and predict The results also introduce non-maximum suppression (non-maximum suppression, NMS), and plot the predicted results with labels:

(1) Create object_detection_yolo.cpp file, import the required header files, and initialize the required global variables:

#include <fstream>
#include <sstream>
#include <iostream>

#include <opencv2/core.hpp>
#include <opencv2/dnn.hpp>
#include <opencv2/imgproc.hpp>
#include <opencv2/highgui.hpp>

using namespace cv;
using namespace dnn;
using namespace std;

// Initialize the parameters
float confThreshold = 0.5; // Confidence threshold
float nmsThreshold = 0.4;  // Non-maximum suppression threshold
int inpWidth = 416;  // Width of network's input image
int inpHeight = 416; // Height of network's input image
vector<string> classes;

(2) We start with the main function, which first reads a file that stores all the categories that the model can predict:

int main(int argc, char** argv) {
    
    
    // 加载类别名
    string classesFile = "data/coco.names";
    ifstream ifs(classesFile.c_str());
    string line;
    while (getline(ifs, line)) classes.push_back(line);

(3) Load the model using the model definition and weights file:

    // 提供模型的配置和权重文件
    String modelConfiguration = "data/yolov3.cfg";
    String modelWeights = "data/yolov3.weights";
    // 加载网络
    Net net = readNetFromDarknet(modelConfiguration, modelWeights);

(4) loads the image and converts it to blob:

    Mat input, blob;
    input= imread(argv[1]);
    if (input.empty()) {
    
    
        cout << "No input image" << endl;
        return 0;
    }
    // 创建输入
    blobFromImage(input, blob, 1/255.0, Size(inpWidth, inpHeight), Scalar(0,0,0), true, false);

(5) Detect all objects and their categories using the setInput and forward functions:

    // 设定网络输入
    net.setInput(blob);
    // 执行前向传播
    vector<Mat> outs;
    net.forward(outs, getOutputsNames(net));

(6) Post-process the output results and plot the detected targets and prediction confidence:

    // 移除低置信度边界框
    postprocess(input, outs);

(7) In the postprocess function, store all bounding boxes with prediction confidence higher than confThreshold:

    vector<int> classIds;
    vector<float> confidences;
    vector<Rect> boxes;
    for (size_t i = 0; i < outs.size(); ++i) {
    
    
        // 扫描网络输出的所有边界框,仅保留具有高置信度分数的边界框
        // 将边界框的类标签指定为边界框得分最高的类别
        float* data = (float*)outs[i].data;
        for (int j = 0; j < outs[i].rows; ++j, data += outs[i].cols) {
    
    
            Mat scores = outs[i].row(j).colRange(5, outs[i].cols);
            Point classIdPoint;
            double confidence;
            // 获取最大分数的值和位置
            minMaxLoc(scores, 0, &confidence, 0, &classIdPoint);
            if (confidence > confThreshold) {
    
    
                int centerX = (int)(data[0] * frame.cols);
                int centerY = (int)(data[1] * frame.rows);
                int width = (int)(data[2] * frame.cols);
                int height = (int)(data[3] * frame.rows);
                int left = centerX - width / 2;
                int top = centerY - height / 2;
                
                classIds.push_back(classIdPoint.x);
                confidences.push_back((float)confidence);
                boxes.push_back(Rect(left, top, width, height));
            }
        }
    }

(8) Apply non-maximum suppression using the NMSBoxes function to get only non-overlapping bounding boxes with high confidence and draw them:

    // 执行非极大值抑制
    // 消除具有较低置信度的冗余重叠边界框
    vector<int> indices;
    NMSBoxes(boxes, confidences, confThreshold, nmsThreshold, indices);
    for (size_t i = 0; i < indices.size(); ++i) {
    
    
        int idx = indices[i];
        Rect box = boxes[idx];
        drawPred(classIds[idx], confidences[idx], box.x, box.y,
                 box.x + box.width, box.y + box.height, frame);
    }

The results of performing target detection using YOLO are as follows:

Test results

3. Complete code

The complete code object_detection_yolo.cpp is as follows:

#include <fstream>
#include <sstream>
#include <iostream>

#include <opencv2/core/core.hpp>
#include <opencv2/dnn.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/highgui/highgui.hpp>

using namespace cv;
using namespace dnn;
using namespace std;

// 初始化参数
float confThreshold = 0.5;  // 置信度阈值
float nmsThreshold = 0.4;   // 非极大值抑制阈值
int inpWidth = 416;         // 网络输入图像宽度
int inpHeight = 416;        // 网络输入图像高度
vector<string> classes;

// 绘制预测边界框
void drawPred(int classId, float conf, int left, int top, int right, int bottom, Mat& frame) {
    
    
    // 绘制显示边界框矩形
    rectangle(frame, Point(left, top), Point(right, bottom), Scalar(255, 255, 255), 1);
    // 获取类别名的标签及其置信度
    string conf_label = format("%.2f", conf);
    string label="";
    if (!classes.empty()) {
    
    
        label = classes[classId] + ":" + conf_label;
    }
    // 在边界框顶部显示标签
    int baseLine;
    Size labelSize = getTextSize(label, FONT_HERSHEY_SIMPLEX, 0.5, 1, &baseLine);
    top = max(top, labelSize.height);
    rectangle(frame, Point(left, top - labelSize.height), Point(left + labelSize.width, top + baseLine), Scalar(255, 255, 255), FILLED);
    putText(frame, label, Point(left, top), FONT_HERSHEY_SIMPLEX, 0.5, Scalar(0,0,0),1,LINE_AA);
}

// 使用非最大值抑制移除置信度低的边界框
void postprocess(Mat& frame, const vector<Mat>& outs) {
    
    
    vector<int> classIds;
    vector<float> confidences;
    vector<Rect> boxes;
    for (size_t i = 0; i < outs.size(); ++i) {
    
    
        // 扫描网络输出的所有边界框,仅保留具有高置信度分数的边界框
        // 将边界框的类标签指定为边界框得分最高的类别
        float* data = (float*)outs[i].data;
        for (int j = 0; j < outs[i].rows; ++j, data += outs[i].cols) {
    
    
            Mat scores = outs[i].row(j).colRange(5, outs[i].cols);
            Point classIdPoint;
            double confidence;
            // 获取最大分数的值和位置
            minMaxLoc(scores, 0, &confidence, 0, &classIdPoint);
            if (confidence > confThreshold) {
    
    
                int centerX = (int)(data[0] * frame.cols);
                int centerY = (int)(data[1] * frame.rows);
                int width = (int)(data[2] * frame.cols);
                int height = (int)(data[3] * frame.rows);
                int left = centerX - width / 2;
                int top = centerY - height / 2;
                
                classIds.push_back(classIdPoint.x);
                confidences.push_back((float)confidence);
                boxes.push_back(Rect(left, top, width, height));
            }
        }
    }
    
    // 执行非极大值抑制
    // 消除具有较低置信度的冗余重叠边界框
    vector<int> indices;
    NMSBoxes(boxes, confidences, confThreshold, nmsThreshold, indices);
    for (size_t i = 0; i < indices.size(); ++i) {
    
    
        int idx = indices[i];
        Rect box = boxes[idx];
        drawPred(classIds[idx], confidences[idx], box.x, box.y,
                 box.x + box.width, box.y + box.height, frame);
    }
}

// 获取输出层的名称
vector<String> getOutputsNames(const Net& net) {
    
    
    static vector<String> names;
    if (names.empty()) {
    
    
        // 获取输出层的索引
        vector<int> outLayers = net.getUnconnectedOutLayers();
        // 获取网络中所有层的名称
        vector<String> layersNames = net.getLayerNames();
        // 获取names变量中输出层的名称
        names.resize(outLayers.size());
        for (size_t i = 0; i < outLayers.size(); ++i) {
    
    
            names[i] = layersNames[outLayers[i] - 1];
        }
    }
    return names;
}

int main(int argc, char** argv) {
    
    
    // 加载类别名
    string classesFile = "data/coco.names";
    ifstream ifs(classesFile.c_str());
    string line;
    while (getline(ifs, line)) classes.push_back(line);
    // 提供模型的配置和权重文件
    String modelConfiguration = "data/yolov3.cfg";
    String modelWeights = "data/yolov3.weights";
    // 加载网络
    Net net = readNetFromDarknet(modelConfiguration, modelWeights);
    net.setPreferableBackend(DNN_BACKEND_OPENCV);
    net.setPreferableTarget(DNN_TARGET_CPU);
    
    Mat input, blob;
    input= imread(argv[1]);
    if (input.empty()) {
    
    
        cout << "No input image" << endl;
        return 0;
    }
    // 创建输入
    blobFromImage(input, blob, 1/255.0, Size(inpWidth, inpHeight), Scalar(0,0,0), true, false);
    // 设定网络输入
    net.setInput(blob);
    // 执行前向传播
    vector<Mat> outs;
    net.forward(outs, getOutputsNames(net));
    // 移除低置信度边界框
    postprocess(input, outs);
    vector<double> layersTimes;
    double freq = getTickFrequency() / 1000;
    double t = net.getPerfProfile(layersTimes) / freq;
    string label = format("Inference time for compute the image : %.2f ms", t);
    cout << label << endl;
    
    imshow("YOLOv3", input);
    waitKey(0);
    return 0;
}

Related Links

OpenCV Practical Combat (1) - OpenCV and Image Processing Basics
OpenCV Practical Combat (2) - OpenCV Core Data Structure
OpenCV Practical Combat (3 )——Image area of ​​interest
OpenCV actual combat (4)——Pixel operation
OpenCV actual combat (5)——Detailed explanation of image operation
OpenCV Practical Combat (6) - OpenCV Strategy Design Pattern
OpenCV Practical Combat (7) - OpenCV Color Space Conversion
OpenCV Practical Combat (8) - Detailed explanation of histogram
OpenCV actual combat (9) - detecting image content based on back-projection histogram
OpenCV actual combat (10) - detailed explanation of integral imageOpenCV Practical Combat (33) - The Collision between OpenCV and Deep LearningOpenCV Practical Combat (32) - Perform target detection using SVM and oriented gradient histogramOpenCV Practical Combat (31) - Target detection based on cascaded Haar featuresOpenCV Practical Combat (30) - Collision between OpenCV and Machine LearningOpenCV Practical Combat (29) - Video Object TrackingOpenCV Practical Combat (28) - Light Flow EstimationOpenCV Practical Combat (27) - Tracking Feature Points in VideosOpenCV Practical Combat (26) - Video Sequence ProcessingOpenCV Practical Combat (25)—3D Scene ReconstructionOpenCV Practical Combat (24)—Camera Pose EstimationOpenCV Practical Combat (23) ——Camera CalibrationOpenCV Practical Combat (22) - Homography and its applicationsOpenCV Practical Combat (21) - Matching images consistently based on random samplesOpenCV actual combat (20)—Image projection relationshipOpenCV actual combat (19)—Feature descriptorOpenCV Practical Combat (18) ——Feature matchingOpenCV Practical Combat (17) - FAST feature point detectionOpenCV Practical Combat (16) - Detailed explanation of corner point detectionOpenCV Practical Combat (15) - Detailed Explanation of Contour DetectionOpenCV Practical Combat (14) - Image Line ExtractionOpenCV Practical Combat (13) ——High-pass filter and its applicationOpenCV Practical Combat (12) - Detailed explanation of image filtering
OpenCV Practical Combat (11) - Detailed explanation of morphological transformation





















Guess you like

Origin blog.csdn.net/LOVEmy134611/article/details/133969504