【文献阅读】小目标检测综述:挑战,技术和数据集(M. MUZAMMUL等人,ACM,2021)

一、文章概况

    文章题目:《A Survey on Deep Domain Adaptation and Tiny Object Detection Challenges, Techniques and Datasets》

    这篇文章的内容实在太长了,这里就挑要点进行介绍了。

    文章下载地址https://arxiv.org/ftp/arxiv/papers/2107/2107.07927.pdf

    文章引用格式MUHAMMAD MUZAMMUL and XI LI. "A Survey on Deep Domain Adaptation and Tiny Object Detection Challenges, Techniques and Datasets" arXiv preprint, arXiv: 2107.07927, 2021.

    项目地址暂无

二、文章摘要

Deep learning (DL) and computer vision (CV) offered a trending role in object detection(OD), object tracking, pedestrian detection, and autonomous vehicles from the last few decades. Several old and recent approaches were proposed to solve these computer vision and deep learningbased problems with detection, tracking techniques, algorithms, and data sources. In recent decades, this field gained importance due to the rising interest in brain-inspired human recognition and detection technologies. Using online/offline data about images and videos, researchers are intensively modeling sentiments and computational analysis. Computer vision-based artificial neural networks (ANN) and convolutional neural network (CNN) based approaches providing robust solutions. This survey paper specially analyzed computer vision-based object detection challenges and solutions by different techniques. We mainly highlighted object detection by three different trending strategies, i.e., 1) domain adaptive deep learning-based approaches (discrepancy-based, Adversarial-based, Reconstruction-based, Hybrid). We examined general as well as tiny object detection-related challenges and offered solutions by historical and comparative analysis. In part 2) we mainly focused on tiny object detection techniques (multi-scale feature learning, Data augmentation, Training strategy (TS), Context-based detection, GAN-based detection). In part 3), To obtain knowledge-able findings, we discussed different object detection methods, i.e., convolutions and convolutional neural networks (CNN), pooling operations with trending types. Furthermore, we explained results with the help of some object detection algorithms, i.e., R-CNN, Fast R-CNN, Faster R-CNN, YOLO, and SSD, which are generally considered the base bone of CV, CNN, and OD. We performed comparative analysis on different datasets such as MS-COCO, PASCAL VOC07,12, and ImageNet to analyze results and present findings. At the end, we showed future directions with existing challenges of the field. In the future, OD methods and models can be analyzed for real-time object detection, tracking strategies.

在目标检测,目标追踪,行人检测和自动驾驶中,计算机视觉都有着广泛的应用。其中人工神经网络和卷积神经网络都有着很好的解决方案。这篇文章主要从三个角度来关注目标检测:(1)域自适应的深度学习方法(基于差异的,基于对抗的,基于重构的,混合模型)在小目标检测中的应用。(2)小目标检测技术(多尺度特征学习,数据增广,训练策略,上下文检测,基于GAN的检测)。(3)对目标检测方法的讨论。组后作者讨论了未来的挑战和方向。

三、文章详细介绍

1.简介(INTRODUCTION)

    随着GPU算力的提高,卷积神经网络被认为是众多新成果中最重要的贡献。一般这些网络都会作为目标检测的backbone。一个好的模型需要能够同时理解输入图片的语义和空间细节。

    然而,由于光照(lighting conditions),遮挡(occlusions),视角(perspectives),姿态(poses)存在差异,目标检测目前仍然需要解决这些问题。

    基于CNN的方法来做目标检测有一下好处:

• Compared to shallow conventional models, a deep CNN-based model offers exponentially more expressive capacity. 越深网络,表征能力越强

• CNN-based object detectors enable you to combine multiple detection-related tasks into one. 可集成多检测任务

• Hierarchical function vectors/representations can be extracted automatically from the underlying data in CNN-based object detectors and disentangled using multi-level nonlinear mappings. 用CNN检测子可以自动提取多级向量,使用多级非线性映射可以对向量进行解缠。

(1)文章结构

这里就简单提一下,有兴趣的可以根据自己需要去翻阅原文:①Section 1介绍目标检测,应用和问题。②Section 2介绍目标检测的技术。③Section 3解释了一些神经网络的方法。④Section 4对不同模型进行对比分析。⑤section 5提供了一些未来的研究方向。

(2)动机

To help beginners get started in this field(帮助初学者开启这个领域)。

(3)和其他相关综述之间的关系

不多说了,直接看表:

 (4)这边综述的贡献

这里作者总结了好多。。。。因为这篇文章确实做了很多工作,这里直接给出吧: 

• We have mentioned some object detection learning strategies in this paper and have not included detailed basic information; however, we have addressed recent, old object detection trends so that readers can easily get depth knowledge at the same place.

• Unlike previous surveys in this area, this paper offers a thorough and systematic examination of deep learning-based object detection techniques, as well as a summary of the most crucial research trends and current object detection algorithms. Most of the results we displayed in tabular form with comparative analysis and literature reference. Readers can get the idea of methodology quickly with tiny insight.

• This survey paper is unique because different computer vision and essential object detection strategies/directions are reviewed simultaneously. Each section offering a complete idea about object detection and results proved with comparative analysis.

• This survey paper consists of well-structured computer vision & object detection; it consists of 567 references and 16 tables. All tables give comparative analysis or literature ideas about trending object detection methodologies or results. Table 2 included(>125 literature references) offering a great idea about computer vision-based OD applications with trending problems and future research direction.

• We specially analyzed object detection applications and surveys in different fields in Table 1, Table 2. Section 2 discussed OD trending approaches such as deep domain adaptive object detection (DDAOD) with varying OD terminologies. This subject is addressed very short in literature, and it's not so common topic. The result of comparative analysis on different OD methods and datasets about DDAOD shown in Table 3(3.1,3.2,3.3,3.4,3.5).

• Next, we highlighted the second domain with tiny object detection approaches as presented different methods with the historical and taxonomical point of view in Table 4,5.

• Section 3 explains object detection methods and types of these methods, focusing on Convolution, convolution neural networks, and pooling operations with a combined picture of many techniques at the same place.

• Table 7,8,9 in section 4 contributed with detailed analysis of object detection backbone models, i.e., CNN, R-CNN, Fast R-CNN, YOLO, SSD and comparative results as pros and cons presented. After performing experiments on datasets with OD approaches/ Methods/Models, the final results are presented in Table 10-16.

• In section 5, we offered a comprehensive conclusion and future direction.

(5)计算机视觉在目标检测上的应用(回顾)

行人检测(Pedestrian detection (PD))

人脸检测(Face detection (FD)): 

文本检测(Text detection (TD)): 

 交通信号检测(Traffic Sign and Traffic Light Detection (TSTLD))

 遥感目标检测(Remote Sensing Target Detection (RSTD))

 (6)小目标检测和域适应中存在的问题

问题一:小目标检测中,单个特征层的信息不够充分(Insufficient information in Individual feature layers for tiny objects detection/domain adaptation)

解决方案:结合深层和浅层信息(Combining features from both shallow and deep layers)

代表方法:①Bottom-Up Approach. ②Top-Down Approach.

问题二:小目标检测中,上下文信息不够充分(Insufficient Context Information for tiny objects detection/domain adaptation)

这里的context feature有三类:①Local pixel context information. ②Semantic context information. ③Spatial context information. 

解决方案:将上下文信息全部结合在检测网络中(Contextual information is incorporated into the detection network)

问题三:小目标检测中,存在小目标类别数量不平衡的问题(Tiny object class imbalance for tiny objects detection/domain adaptation)

解决方案:训练过程中平衡正样本和负样本(Balancing positive/negative examples during training)

代表方法:①Data based approach。②Loss function-based approach

问题四:小目标检测中,小目标的样本短缺(Shortage of tiny objects examples for tiny objects detection/domain adaptation)

解决方案:使用可以生成小目标,或者匹配小目标的anchor的方法(Use methods that can generate/match more small-object anchors.)

代表方法:①multi-scale mechanism。②Matching strategy。③Increasing positive examples of small objects.

2.目标检测方法(OBJECT DETECTION TRENDING APPROACHES)

(1)用于小目标检测的传统方法

基于人脸检测的研究(Face detection-based research)

技术1:人脸检测中feature maps的改进(Feature maps improvements for small faces/DDA-OD)。包括skip connections, ROI-based block normalization layer, element multiplication, 

技术2:整合人脸的上下文信息(Incorporate context information of small faces(a problem in detection based domain))。包括higher filter sizes,feature fusion,Inception-like network,produce labels for other body parts,dense block structure。

技术3:修正人脸中前景和背景类别不平衡的问题(Correcting foreground/background class imbalance for small faces(Have a vital role in detection and domain adaptation))。思路有三个,①改变滤波的anchor(Anchors for filtering),②从样本出发(Sampling),③多尺度训练(Training on multiple scales)。

一般的小目标检测研究(Generic object detection research)

技术1:小目标中的feature map改进(Feature map improvement in tiny objects)。一般定位依赖于low-level feature,而分类依赖于high-level feature。因此小目标检测需要整合多个层级的特征,代表方法有bottom-up approach,non-linear feature map transformation,Skip connections,feature pyramid fusion。

技术2:整合小目标的上下文信息(Incorporate context information for tiny objects),上下文信息分为两类,局部接触(local contact)和语义上下文(sematic context information),代表方法有bounding boxes,deconvolutional layers with "skip connections"。

技术3:小目标前景和背景类别平衡(Small objects foreground/background class imbalance),两种解决思路,基于数据的方法和基于loss的方法。

技术4:小目标训练样本的增加(Training examples increase for tiny objects),代表方法就3类,multi-scale learning neural network architectures,scale transformation,adaptive anchor box matching。

航空影像的目标检测(In aerial images, object detection)

传统的方法可分成4类①template matching-based基于模板匹配的,②knowledge-based基于知识的,③OBIA-based,④machine learning-based基于机器学习的。当然基于深度学习最主要要考虑的两个问题multi-scale和multi-angle。

技术1:航空影像中目标方向处理(Object orientation handling in aerial images)

技术2:航空影像目标中的上下文信息(Take into account the context of aerial image objects.),采用合并feature maps和膨胀卷积(dilated convolutions)。

技术3:修正前景和背景类别的不平衡问题(For aerial image objects, correcting foreground and background class imbalance.),代表方法IoU-Adaptive Deformable R-CNN.

技术4:增加训练样本(Increase the number of aerial picture object training examples)

小目标检测的实例分割(Instance segmentation approach for small object detection)

最早的一个模型是FCN,之后是U-Net,再到FPN。另外还有一些其他方法,比如胶囊网络(Capsule networks),bottom-up and top-down network。

(2)深度域适应目标检测(Deep Domain Adaptive Object Detection

传统深度学习做目标检测,都有一个前提假设,即训练集和测试集的数据分布是一致的,因此为了获得更好的检测精度,则需要收集大量训练数据。而域适应就是为了解决这个问题的。目前域适应的深度学习目标检测方法很少。

域适应的目标检测,有5种不同的类别和机制:基于差别的(discrepancy-based),基于对抗的(adversarial-based),基于重构的,混合的和其他。

• One-step vs. multi-step adaptation methods: Weather the source and target domains are closely linked, information transfer may be completed in a single step. Although there is little overlap between the two parts, multi-step DA uses a series of intermediate bridges to link two different domains and then execute one-step DA using this bridge. 如果目标域和源域接近,就单步转换;如果只有很少的重叠,就多步转换。

• Labeled data from the target domain: DDAOD may be classified as supervised, semi supervised, weakly-supervised, few-shot, or unsupervised based on labeled data and domain. 域适应目标检测有可能是有监督的,也有可能是无监督的。

• Base detector: Domain adaptive detection methods are often based on existing excellent detection models such as Faster RCNN, YOLO, SSD, and others. 域适应目标检测多基于现有的优秀模型

• Is the method's source code open source or not? This aspect shows whether the method's source code is available on the internet. The relation will be given if it is open source. 代码是否开源

基于做差的域适应目标检测(Discrepancy-based DDAOD)

此类方法可以减少域之间的差异。解决方案即训练的时候添加噪声,"training with noisy labels.",总结如下表:

 基于对抗的域适应目标检测(Adversarial-based DDAOD)

 总结如下表:

 基于重构的域适应目标检测(Reconstruction-based DDAOD)

 代表方法cycleGAN,总结如下表:

 混合模型(Hybrid DDAOD)

 总结如下表:

其他域适应目标检测模型(Other DDAOD) 

一些方法,比如包含图的原型对齐(graph-induced prototype alignment),或者类别的正则化(categorical regularization),总结如下表:

(3)小目标检测方法的发展趋势(Object detection trending approaches for tiny objects) 

 小目标的定义有两种,一种像素数少于32*32的。对于小目标检测的技术前面也说了,多尺度学习的,数据增广,训练策略,上下文检测,基于GAN的。

一些小目标检测的文献如下表:

 下图就总结了小目标检测的类型:

 具体的模型就不介绍了,简单说一下不同类型的小目标检测。

基于多尺度的:

 数据增广:

 训练策略:

基于上下文检测:

基于GAN的检测:代表方法MTGAN

目前一些最好的人脸检测技术:

 小目标检测的挑战和域适应

 自编码器可以从源域来重构目标域数据。而小目标易与环境混淆,增加了它误匹配的概率;以及不清晰数据的重构也是一个很大挑战。

小目标检测中深度域适应的应用:

3.目标检测架构(OBJECT DETECTION TRENDING ARCHITECTURES)

先重点介绍一下目标检测框架的trends,具体包括:

Convolutions, Convolutional Neural Networks, Pooling Operations, Normalization, Skip Connection Blocks, Feature Extractors, Skip Connections, Image Model Blocks, Feedforward Networks, Regularization, Feature Pyramid Blocks, Feature Extractors, Initialization, Activation Functions, Instance Segmentation Models, Learning Rate Schedules, RoI Feature Extractors, Region Proposal, Stochastic Optimization, Output Functions, Regularization

由于篇幅问题,这里只重点介绍一些trends。

(1)目标检测中的卷积(Convolution methods and their types for object detection: Image processing)

卷积操作可以表示为:

 卷积很多,至今为止,有6751篇文章使用了该方法。具体类别包含:

a. 1x1 Convolution(1x1C)

b. 3D Convolution(3DC)

c. Active Convolution (AC)

d. Attention-augmented Convolution (AAC)

e. Conditional Convolution (CondConv)

f. Convolution

g. Coordinate Convolution (CoordConv)

h. Deformable Convolution (DC)

i. Deformable Kernel (DK)

不同类型的卷积示意图如下:

 除了以上这些常见的卷积,还有其他不常见的卷积:

j. Depth wise Convolution

k. Depth wise Dilated Separable Convolution

r. Depth wise Separable Convolution

o. Dilated Convolution

y. Dim Convolution

x. Dynamic Convolution

l. Grouped Convolution

m. Groupwise Point Convolution

w. Invertible 1x1 Convolution

n. Light Convolution

p. Masked Convolution

t. Mix Convolution

u. Octave Convolution

m. Pointwise Convolution

v. Selective Kernel Convolution

q. Spatially Separable Convolution

s. Submanifold Convolution

(2)目标检测中的卷积网络(Convolutional neural networks methods and types for Object detection)

最简单的卷积网络可以分成三个部分,特征提取,分类,概率分布:

 常见的网络包括:

(a) Residual Network (ResNet)

(b) AlexNet

(c) VGG

(d) DenseNet

(e)MobileNetV2

(f) GoogLeNet

(g)ResNeXt

(h)Xception

(i)Darknet-53

(j) SqueezeNet

(k) Inception-v3

(l) EfficientNet

(m) LeNet

(n) Darknet-19 

(o) MobileNetV1

(p) WideResNet

(q) ShuffleNet

(r) SENet

(s) MobileNetV3

(t) MnasNet

(u) Inception-ResNetv2

(v) HRNet

这些网络的结构示意图如下(实在太多了):

 (3)目标检测中的池化操作(Pooling operational methods and types for object detection)

这里作者对方法总结了一个表:

 对应方法的示意图如下:

4.数据集上的实验结果和发现(RESULTS AND FINDINGS WITH ANALYSIS ON DATASETS)

(1)遥感中的目标检测应用

包括以下阶段:

a) Image processing

        • Image Acquisition

        • Image Enhancement

        • Morphological Processing (MP)

        • Image Restoration

        • Colour Image Processing

        • Image compression

        • Image segmentation

b) Convolutional Neural Networks

        • Input Layer

        • Hidden Layer

        • Output Layer

        • Convolutional Layer

这里用CNN处理遥感图像的应用作者举了两篇,一篇是用CNN做分类的,一篇是用3Dimentional CNN做运动目标识别的。

(2)一些目标检测框架的性能分析

首先作者列出了一些经典模型在不同参数设置下的表现:

 以及典型模型的优缺点分析:

 以及模型中的一些细节:

 (3)一般目标检测的数据集和评估指标

 PASCAL VOC dataset;评估指标mAP,precision和recall

经典模型在这个数据集上的表现如下:

 MS COCO BENCHMARK;评估指标AP

 经典模型在这个数据集上的表现:

 ImageNet

VisDrone2018 BENCHMARK

 OPEN IMAGES V5

(4)一些用于小目标检测的数据集

直接上图:

 顺便给出一个不同类型的小目标检测方法在COCO数据集上的结果:

 5.总结及未来的挑战(CONCLUSION AND CHALLENGING FUTURE DIRECTIONS)

直接放重点:

作者认为,未来一般目标检测的方向可以有

• Unsupervised Detection Objects detection 无监督目标检测

• Real-Time Detection Remote sensing 实时遥感图像目标检测

• Weakly supervised object detection 弱监督目标检测

• Video Object Detection 视频目标检测

• Multi-domain object detection 多域目标检测

• Salient Object Detection 显著目标检测

• Multi-task Learning 多任务学习

• GAN Object Detectors GAN目标检测

对于深度域适应模型,未来的研究方向可以有

• Combining the merits of adaptation methods 结合适应方法的优点

• Local nature detection 局部自然检测

• Homogenous DDAOD 同类深度域适应目标检测

小目标检测的未来研究方向可以有:

• Transmission of information 信息传递

• Small object detection techniques weakly monitored 小目标检测技术监测薄弱

• Benchmarks and datasets evolving for small object detection 小目标检测的数据集及评估

• Joint multi-task learning and improvement 联合多任务学习及改进

• Small object detection framework 小目标检测框架

四、小结
 

猜你喜欢

转载自blog.csdn.net/z704630835/article/details/119893552
今日推荐