Overview of Object Detection - Part 2

YOLO


Predict bounding boxes and class probabilities directly from full images in a single evaluation using a single neural network. Since only one network is used in the entire detection process, the detection performance can be directly optimized end-to-end.

YOLO structure: ----GoogleNet + 4 convolutions + 2 fully connected layers

  • 1. Scale the image to 448X448
  • 2. Run the convolutional network on the graph
  • 3. Threshold the detection results according to the confidence of the model

insert image description here

  • Comprehension of output 7 * 7 * 30

cell

7 x 7=49 pixel values, understood as 49 cells, each cell can represent a square of the original image. Cells need to do two things:

  • Each bounding box contains two object predictions, each object includes 5 predicted values: x, y, w, h and confidence
  • Each cell predicts two (default) bbox positions, two bbox confidences (confidence): 7 x 7 x 2=98 bboxes. 30=(4+1+4+1+20), 4 coordinate information , 1 confidence (confidence) represents the result of a bbox, 2 0 represents the predicted probability result of 20 categories

insert image description here

  • Grid output filter

  1. A grid will predict two Bboxes, and we only have one Bbox dedicated to it during training (one Object and one Bbox)

  2. The 20 class probabilities represent a bbox in this network

  3. confidence

    • If there is no object in the grid cell, confidence is 0

    • If there is, the confidence score is equal to the IOU product of the predicted box and the ground truth, (the two bboxes in each cell are compared with the real value to determine the final bbox)

  • Non-Maximum Suppression (NMS)

training loss

  • Three-part loss bbox loss + confidence loss + classfication loss

YOLO V2

For the YOLO algorithm, improvements: (training mechanism, network changes – Darknet-19, k-means clustering algorithm cluster analysis on the bounding boxes in the training set, direct position prediction)

YOLO V3

Improvement: (Network Darknet-53, logistic regression instead of softmax as classifier)

Reference:
https://zhuanlan.zhihu.com/p/94986199
YOLO paper

Guess you like

Origin blog.csdn.net/Peyzhang/article/details/126111181