解读 Fast-RCNN（1）

大家都知道，fast-rcnn 用于图像的检测，比如图像里有一只猫，可以通过这个算法，检测到有猫，并且

可以用一个红框框把猫框出来

目标检测，深度估计和语义分割一样，是图像理解这一块，准确讲是 image understanding

来看一下Abstract部分，可以了解到，

1, This paper proposes a Fast Region-based Convolution Network method (Fast R-CNN) for object

detection.

2, Fast R-CNN builds on previous work to efficiently classify object proposals using deep convolution

networks.

另外，fast R-CNN 真的很快，这就不细说了，作者把源代码公开了，这是值得赞扬的

再来看Introduction部分，可以认识一些基本的，概念上的东西，

1, Compared to image classification, object detection is a more challenging task that requires more

complex methods to solve

目标跟踪是复杂的原因在于，

1, Complexity arises because detection requires the accurate localization of objects

2, Numerous candidate object locations must be processed

3, These candidates provide only rough localization that must be refined to achieve

precise localization

4, Solution to these problems often compromise speed, accuracy, or simplicity

这篇文章的贡献是

We propose a single-stage training algorithm that jointly learns to classify object proposals and

refine their spatial locations

作者综述之前的经典方法，比如R-CNN和SPPnet，先总体说一下：

The Region-based Convolution Network (R-CNN) achieves excellent object detection accuracy by using

a deep ConvNet to classify object proposals

R-CNN 自身存在一些问题：

1, Training is a multi-stage pipeline

2, Training is expensive in space and time

3, Object detection is slow

总而言之，R-CNN算法很慢。很慢的原因是，

1, R-CNN is slow because it performs a ConvNet forward pass for each object proposal

without sharing computation

因此，引出了SPPnets的方法，也算是R-CNN的改进，主要在 sharing computation 做文章

Spatial pyramid pooling networks (SPPnets) were proposed to speed up R-CNN by sharing computation

SPPnets的简述，

1, The SPPnet method computes a convolutional feature map for the entire input image and then

classifies each object proposal using a feature vector extracted from the shared feature map

2, Features are extracted for a proposal by max-pooling the portion of the feature map inside the

the proposal into a fixed-size output

3, Multiple output sizes are pooled and then concatenated as in spatial pyramid pooling

我之前没有看过SPPnet，所以不能特别清除它的用处，就仅作了解吧

同时，SPPnet 有一些缺点，不急，听作者的叙述，

1, Like R-CNN, training is a multi-stage pipeline that involves extracting features, fine-tuning a

network with log loss, training SVMs, and finally fitting bounding-box regressors.

2, But unlike R-CNN, the fine-tuning algorithm cannot update the convolutional layers that

precede the spatial pyramid pooling.

3, Features are also written to disk

于是，

Unsurprisingly, this limitation (fixed convolutional layers) limits the accuracy of very deep networks

Contributions

作者说自己的方法可以克服这些困难，

1, Higher detection quality than R-CNN, SPPnet

2, Training is single-stage, using a multi-task loss

3, Training can update all networks layers

4, No disk storage is required for feature caching

下次接着讨论！

Niuip

发布了58 篇原创文章 · 获赞 36 · 访问量 2万+

私信关注

解读 Fast-RCNN（1）

猜你喜欢