解读 Fast-RCNN(1)
大家都知道,fast-rcnn 用于图像的检测,比如图像里有一只猫,可以通过这个算法,检测到有猫,并且
可以用一个红框框把猫框出来
目标检测,深度估计和语义分割一样,是图像理解这一块,准确讲是 image understanding
来看一下Abstract部分,可以了解到,
1, This paper proposes a Fast Region-based Convolution Network method (Fast R-CNN) for object
detection.
2, Fast R-CNN builds on previous work to efficiently classify object proposals using deep convolution
networks.
另外,fast R-CNN 真的很快,这就不细说了,作者把源代码公开了,这是值得赞扬的
再来看Introduction部分,可以认识一些基本的,概念上的东西,
1, Compared to image classification, object detection is a more challenging task that requires more
complex methods to solve
目标跟踪是复杂的原因在于,
1, Complexity arises because detection requires the accurate localization of objects
2, Numerous candidate object locations must be processed
3, These candidates provide only rough localization that must be refined to achieve
precise localization
4, Solution to these problems often compromise speed, accuracy, or simplicity
这篇文章的贡献是
We propose a single-stage training algorithm that jointly learns to classify object proposals and
refine their spatial locations
作者综述之前的经典方法,比如R-CNN和SPPnet,先总体说一下:
The Region-based Convolution Network (R-CNN) achieves excellent object detection accuracy by using
a deep ConvNet to classify object proposals
R-CNN 自身存在一些问题:
1, Training is a multi-stage pipeline
2, Training is expensive in space and time
3, Object detection is slow
总而言之,R-CNN算法很慢。很慢的原因是,
1, R-CNN is slow because it performs a ConvNet forward pass for each object proposal
without sharing computation
因此,引出了SPPnets的方法,也算是R-CNN的改进,主要在 sharing computation 做文章
Spatial pyramid pooling networks (SPPnets) were proposed to speed up R-CNN by sharing computation
SPPnets的简述,
1, The SPPnet method computes a convolutional feature map for the entire input image and then
classifies each object proposal using a feature vector extracted from the shared feature map
2, Features are extracted for a proposal by max-pooling the portion of the feature map inside the
the proposal into a fixed-size output
3, Multiple output sizes are pooled and then concatenated as in spatial pyramid pooling
我之前没有看过SPPnet,所以不能特别清除它的用处,就仅作了解吧
同时,SPPnet 有一些缺点,不急,听作者的叙述,
1, Like R-CNN, training is a multi-stage pipeline that involves extracting features, fine-tuning a
network with log loss, training SVMs, and finally fitting bounding-box regressors.
2, But unlike R-CNN, the fine-tuning algorithm cannot update the convolutional layers that
precede the spatial pyramid pooling.
3, Features are also written to disk
于是,
Unsurprisingly, this limitation (fixed convolutional layers) limits the accuracy of very deep networks
Contributions
作者说自己的方法可以克服这些困难,
1, Higher detection quality than R-CNN, SPPnet
2, Training is single-stage, using a multi-task loss
3, Training can update all networks layers
4, No disk storage is required for feature caching
下次接着讨论!