CornerNet paper algorithm notes

Disclaimer: This article is a blogger original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
This link: https://blog.csdn.net/JMU_Ma/article/details/91352268

CornerNet background

The current one-stage and two-stage detection of a target state-of-the-art methods are based anchor, drawbacks: (1) a large number of anchor frames, positive and negative samples is not balanced, slow down the speed of the training. (2) requires hyperparametric and design (how much anchor, size, scale)

one-stage and two-stage difference:

- one-stage two-stage
The main algorithm Yolov1、Yolov2、Yolov3、SSD、RetainNet Fast R-CNN、Faster R-CNN、DeNet
Detection accuracy Lower Higher
Detection rate Faster Slower

The algorithm flow chart FasterRCNN

YOLO algorithm flow chart

CornerNet body part

Our cornrtNet paper, there are two parts belong to our innovations.

  1. The target detection as the key point to solve the problem, that is, obtain the prediction box by two key points left and right corners of the detection target frame, so there is no anchor in CornerNet algorithm concept
  2. Training the entire monitoring network is to start from scratch, not based on pre-trained classification model, which allows users to freely design feature extraction network, not limited by pre-training model

Most of our previous target detection methods are based anchor made, such as FasterR-CNN, SSD, YOLO (v2, v3), etc., after the introduction of anchor, we have to target detection efficiency has been significantly improved, but at the same time as anchor of introduction, will bring some disadvantages:

  1. This will lead to excessive computation of our algorithm, since many of the algorithms are thousands of anchor, but we need to detect a target and not so much, it will cause too much redundant computation algorithms. So with focal loss of negative samples and sub-sampling algorithms to do to solve this problem.
  2. Introduce more ultra-parameters, such as the number of anchor, length, width, higher, so this does not use anchor CornerNet able to have a good effect of eliminating the need for these additional operations.

CornerNet network structure

First, a layer of 7x7 convolution of the input image size is reduced to 1/4 of the original (paper input image size is 511x511, 128x128 size is output), extraction feature after entering backbone network, backbone network main network uses hourglasss network, while the hourglass network module consists of two hourglass. Each module is sized by a series of input image downsampling operation, and then lower and lower by the sample returned to the original image size. The entire depth of layer 104 is hourglass network.

After two output branches have hourglasss network, respectively predicted left corner and lower right prediction prediction branches, each comprising a corner poolings and three outputs: heatmap, embedding, offsets.

HeatMap : prediction corner information is output, as may be the dimensions H x W of the feature map, which indicates the type of object C (Note: no background) of each channel is a characteristic diagram of this mask, the range of the mask is 0 to 1

embeddings : embeddings main role is to predict the corner points do group, that is to say detection detects whether the left and right corners are the same goals.

offests : used to predict the position of the frame is finely adjusted, since the point feature map to be mapped during quantization error.

In our CornerNet algorithm there is a relatively innovative things that we used in this network pooling layer ~~ Corner Pooling.

Corner Pooling:

Corner Pooling is calculated as follows:

Here Insert Picture Description

Here Insert Picture Description

Why do not we use our common maximum pooling (max-pooling) and pooling the average (mean-pooling) it in CornerNet network?

This is because we CornerNet the special nature of the network, our CornerNet main test is that we need to detect a target object left and right corners, if you use the usual pool layer, we can not determine the left-up corner detecting an object Therefore we use the upper left corner of the right corner pooling, has been critical information to detect objects, but the bottom right of the upper left corner there is critical information to detect objects, and therefore there is a corner pooling.

corner pooling is how to do it (to the left of the pool: from right to left to see, on top of the pool: from the bottom up, pooling on the right: from left to right, bottom of the pool: from top to bottom)

The mathematical formula CornerNet

L = L d e t + a L p in l l + β L p u s h + γ L o f f L = L_ {det} + \ pull {alpha} + L_ \ beta push L_ {+} \ end {L_ off}

One of our L_ {det} representative of our Heatmap loss function, we L __ {det} concrete form as follows:

Here Insert Picture Description

N represents the number of the equation is the object of our testing, and P_ {cij} is the prediction value heatmap (i, j) position of the C-channel, and y __ {cij} our Gaus-sians Loss value, where x and y represented by (i, j) is the relative position of the origin of coordinates.
y c i j = e x 2 + y 2 2 σ 2 y_ {ij} = e ^ {- \ frac {x ^ 2 + y ^ 2} {2 \ sigma ^ 2}}
L_ {pull} specific forms as follows:

Here Insert Picture Description

In the formula e_ {tk} k behalf of the Target by embedding vector of the upper left corner, and e __ {ek} embedding vector k belonging to the class of the lower right corner of the goal, and e_ {k} is e_ {tk} and e_ the average value of {bk}.

L_ {push} specific forms as follows:

Here Insert Picture Description

L_ {push} effect is used to expand the distance vector point does not belong to the same corners of the target.

Another is offset output CornerNet , value and target detection algorithm to predict the offset similar but completely different, say something like that because the information is biased, say the same because of the predicted target detection algorithm is offset represents the offset between the block prediction and the anchor, where the offset is lost when taking the whole calculation accuracy information, i.e. content expressed by equation 2.

Here Insert Picture Description

We know that there is between the size reduction from the input image to the feature map, assuming demagnification is n, then the input point corresponds to the following equation on the feature map on the image (x, y).

Here Insert Picture Description

The formula symbol is rounded down, rounding will bring loss of precision, which particularly affects the return of the small size of the target, Faster RCNN the ROI Pooling also have similar accuracy loss problems. Therefore offset calculated by Equation 2, then Equation 3 smooth L1 loss function supervised learning similar to the return branch parameter, and the common target detection algorithm.

Experimental results

Here Insert Picture Description

Table1 Comparative Experiment is about conner pooling can be seen added conner pooling (second line) significantly enhance the effect of the comparison, in this particular lifting large scale target data performed significantly.

Here Insert Picture Description

Table2 is to take effect on different weight loss function for negative samples at different locations. The first line is not the effect of using this strategy; second line is the effect of using a fixed radius value, it can be seen lifting obvious; the third line is based radius calculated target effects (article employed a), the effect is further enhanced.

to sum up

We have introduced CornerNet, this is a new target detection method can detect the bounding box corner is paired. We CornerNet evaluated on MS COCO, and demonstrate the competitive results. But because of this our forecast accuracy will therefore decline. Such as might put two objects relatively close distance to an object and for detecting marquee. Such as the following

Here Insert Picture Description

To solve this problem, and proposed a method CenterNet, reducing the occurrence of this situation to some extent.

Guess you like

Origin blog.csdn.net/JMU_Ma/article/details/91352268