YOLO series loss function study notes

Yolov1

Yolov1 is a pioneer for Yolo series, the paper gives specific loss function.
The idea is extremely simple nature of violence, the target detection problem as a regression problem,
coordinates, width and height, classification, confidence (confidence level target, no target confidence level) have adopted SSE loss function loss, meal mad hate , platform-independent target detection count to force out, no special skills. .
In the form of an output vector 7 * 7 * (5 * 2 + 20) = 7 * 7 * 30
loss function defined as follows:
Here Insert Picture Description

Yolov2

Yolov2 Yolov1 is an upgraded version,

  1. The number of K-mean clustering used in a method of producing the anchor box, clustering using 1-IOU (box, centriod) as the distance metric, then a tradeoff made between model complexity and recall determined K = 5.
    Further, Yolov1 prediction is based on a cell, so one can only predict a target cell. yoloV2 Each cell has 5 anchor box, and the cells encoding changed, anchor bo encoding a cell can be predicted based on a plurality of cell categories simultaneously.
  2. All layers are applied convolution BN, accelerate convergence and reduce over-fitting.
  3. Enter the multi-scale training (because only convolution Pooling layer and layer, you can change the size at any time, change once every 10epoch input size).
  4. feature map mesh is 13 * 13, the output vector is encoded as 13 * 13 * 5 * (C + 5), to give finer features, to adapt to the small target detection.
    Yolov2 Yolov1 loss function with little difference, the only difference is about the loss of w and h bbox to remove the root, root authors believe that there is no need, that is
    Here Insert Picture Description

Yolosv3

Yolov3 to the idea of ​​the feature-based fusion FPN, the deep semantic features and details of the fusion shallow features, the prediction accuracy is further enhanced.

  1. This clustering method YOLO3 continuation Yolov2 priori frame for each sample set 3 kinds of scale prior frame, the total cluster sizes of the prior frame 9. In this data set COCO prior frame 9 is: (10x13), (16x30), (33x23), (30x61), (62x45), (59x119), (116x90), (156x198), (373x326).
    Here Insert Picture Description
  2. Objects Sort by softmax into logistic. Not used when prediction target category softmax, category scores provided softmax operation is performed are mutually exclusive categories, using logistic into the output of the prediction. This can support multi-label object (such as a person Woman Person and two labels),
  3. Multi-scale intensive predict. 416416 an input image is provided in each of three frames prior to each feature map grid scale, a total of 13 * 13 * 26 * 26 * 3 + 3 * 52 + 52 * 3 = 10647 prediction. Each prediction is a (4 + 1 + 80) = 85-dimensional vector, this 85-dimensional vector comprising frame coordinates (four values), the border confidence (1 values), the probability of an object class (for COCO dataset, there 80 kinds of objects).
    Compare, YOLO2 using 13 * 13 * 5 = 845 predicts the number of attempts to predict the border YOLO3 increased 10 times, and is carried on a different resolution, the effect on mAP and detecting small objects have a certain upgrade.
  4. Using residual structure, forming a deeper level of network, multi-scale and detection to enhance the detection of small objects and mAP effect.

Loss function is divided into x / y loss, w / h loss, loss of confidence, loss category.

a)xy loss
xy_loss = object_mask * box_loss_scale * K.binary_crossentropy(raw_true_xy, raw_pred[..., 0:2], from_logits=True)

As can be seen, a binary_crossentropy loss.

b) wh loss
wh_loss = object_mask * box_loss_scale * 0.5 * K.square(raw_true_wh - raw_pred[..., 2:4])

It is a variance.

c) loss of confidence
confidence_loss = object_mask * K.binary_crossentropy(object_mask, raw_pred[..., 4:5], from_logits=True) + (1 - object_mask) * K.binary_crossentropy(object_mask, raw_pred[..., 4:5], from_logits=True) * ignore_mask

Instead of using a sigmoid function to calculate the probability distribution softmax, loss function is employed using the binary cross entropy.

d) Finally, a simple loss of the sum averaging summation
xy_loss = K.sum(xy_loss) / mf
wh_loss = K.sum(wh_loss) / mf
confidence_loss = K.sum(confidence_loss) / mf
class_loss = K.sum(class_loss) / mf
loss += xy_loss + wh_loss + confidence_loss + class_loss
Published 29 original articles · won praise 12 · views 10000 +

Guess you like

Origin blog.csdn.net/c2250645962/article/details/103920865