RCNN series development history

1. RCNN

  RCNN was published in CVPR 2014 . It is the pioneering work of applying deep learning to the field of target detection. With the powerful feature extraction ability of convolutional neural network compared with traditional CV methods, the detection rate of PASCAL VOC data set has been increased from 35.1% to 53.7%.

  The algorithm flow of RCNN is shown in the figure below, and the process is mainly divided into 4 steps:

  • Generate candidate regions . Region Proposal is used to extract candidate areas, and the Selective Search algorithm is used to first divide the image into small areas, and then merge and output areas with high probability of containing the same object. In this step, RCNN needs to extract 2000 candidate regions, and normalize each region to obtain a fixed-size image.
  • CNN feature extraction . Input the above fixed-size image into CNN to obtain a fixed-dimensional output feature map.
  • SVM classifier . Use the linear binary classifier to classify the above output features to obtain the classification results, and use difficult sample mining to balance the imbalance of positive and negative samples.
  • Location refinement . Using a regressor, regress the boundaries of features to get more precise target regions.

insert image description here

2. Fast RCNN

  Fast RCNN was published in ICCV 2015 . RCNN requires multi-step training, the steps are cumbersome and the training speed is slow . Fast RCNN achieves end-to-end training. Based on the VGG16 network, the training speed is nearly 9 times faster than RCNN, and the testing speed is nearly 213 times faster. The detection rate in the PASCAL VOC dataset reached 68.4%.

  The algorithm flow of Fast RCNN is shown in the figure below. Compared with RCNN, there are three main improvements:

  • Shared convolution . Send the entire image into the convolutional network for region generation, instead of candidate regions one by one like RCNN. The Selective Search method is still used, but the shared convolution greatly reduces the amount of calculation.
  • RoI Pooling . The method of feature pooling (RoI Pooling) is used for feature scale transformation. This method ensures that the size of the input image can be of any size, making the training process more flexible and accurate.
  • Multitasking loss . The classification and regression networks are trained together, and in order to avoid the shortcomings of separate training and slow speed brought by the SVM classifier, the Softmax function is used for classification.

insert image description here

3. Faster RCNN

  Faster RCNN was published in NIPS 2015 . This algorithm proposes the RPN (Region Proposal Network) network, uses the Anchor mechanism to connect the region generation and the convolutional network, and improves the detection speed to 17FPS (Frames Per Second), and achieves 70.4% detection effect on the PASCAL VOC2012 test set.

  The algorithm flow of Faster RCNN is shown in the figure below, which mainly includes 4 parts:

  • Feature extraction network . The input image first passes through the Backbone to obtain the feature map.

  • RPN module . The region generation module is used to generate a better suggestion box (Proposal), using a strong prior Anchor, RPN includes 5 sub-modules:

    • Anchor generation. Each point on the feature map corresponds to 9 Anchors. These 9 Anchors have different sizes, widths and heights. Corresponding to the original image, they can basically cover all possible objects. The task of RPN is to screen from a large number of Anchors and adjust a better position to get a Proposal.

    • RPN network. Corresponding to the above-mentioned Anchor, the prediction score and prediction offset value of each Anchor are obtained on the feature map by using 1×1 convolution.

    • Calculate the RPN loss. In this step, only in training, match all anchors with labels (Ground Truth). Anchors with better matching degree are given positive samples, and poorer ones are given negative samples, so as to obtain the true value of classification and offset, and perform loss calculation with the predicted score and predicted offset value in the second step.

    • Generate Proposals. Using the prediction score and prediction offset value of each anchor mentioned above, a set of better proposals is further obtained and sent to the subsequent network.

    • Screen the Proposal to get the ROI. During training, 2000 Proposals were generated, and the Proposals were further screened to obtain 256 RoIs. During testing, this module is not needed, and Proposal can be directly used as RoI.

  • RoI Pooling module . The input is the feature map extracted by Backbone and the RoI generated by RPN, and the output is sent to RCNN. Since RCNN uses a fully connected network, the dimension of the feature is required to be fixed, and the size of the feature corresponding to each RoI is different, so RoI Pooling pools the features of the RoI to a fixed dimension to be sent to the fully connected network.

  • RCNN module . Send the features obtained by RoI Pooling into the fully connected network, predict the classification of each RoI, and predict the offset to refine the frame position, and calculate the loss value to complete the entire Faster RCNN process. The RCNN module includes 3 parts:

    • RCNN fully connected network. The obtained fixed-dimensional RoI feature is sent to the fully connected network, and the output is the predicted score and predicted regression offset of the RCNN part.

    • Calculate the truth value of RCNN. For the filtered RoI, it is necessary to determine whether it is a positive sample or a negative sample, and at the same time calculate the offset from the corresponding real object. In actual implementation, this step is often put together with RPN's final screening of RoI.

    • RCNN loss. Calculate the loss of classification and regression through the predicted value of RCNN and the true value of the RoI part.

insert image description here

Guess you like

Origin blog.csdn.net/python_plus/article/details/130118099