最近,Visual Transformer 的研究热点达到了前所未有的高峰,仅 CVPR 2021 就发表了 40 多篇,应用涉及:图像分类、目标检测、实例分割、语义分割、行为识别、自动驾驶、关键点匹配、目标跟踪、NAS、low-level视觉、HoI、可解释性、布局生成、检索、文本检测等方向。
引爆CV圈 Transformer热潮的有两篇最具代表性论文,即 ECCV 2020的 DETR(目标检测)和 ICLR 2021的 ViT(图像分类)。
目录
CVPR 2021 Visual Transformer 论文合集
CVPR 2021 Visual Transformer 论文合集
1. End-to-End Human Pose and Mesh Reconstruction with Transformers
2. Temporal-Relational CrossTransformers for Few-Shot Action Recognition
3. Kaleido-BERT:Vision-Language Pre-training on Fashion Domain
4. HOTR: End-to-End Human-Object Interaction Detection with Transformers
-
Code: None
5. Multi-Modal Fusion Transformer for End-to-End Autonomous Driving
6. Pose Recognition with Cascade Transformers
7. Variational Transformer Networks for Layout Generation
-
Code: None
8. LoFTR: Detector-Free Local Feature Matching with Transformers
-
Homepage: https://zju3dv.github.io/loftr/
中文解读:CVPR 2021 | 稀疏纹理也能匹配?速览基于Transformers的图像特征匹配器LoFTR
9. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
10. Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers
-
Code: None
11. Transformer Tracking
-
Code: https://github.com/chenxin-dlut/TransT
12. MIST: Multiple Instance Spatial Transformer
-
Code: None
13. Multimodal Motion Prediction with Stacked Transformers
14. Revamping cross-modal recipe retrieval with hierarchical Transformers and self-supervised learning
15. Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking
-
Paper(Oral):https://arxiv.org/pdf/2103.11681.pdf
16. Pre-Trained Image Processing Transformer
-
Paper: https://arxiv.org/abs/2012.00364
-
Code: None
17. End-to-End Video Instance Segmentation with Transformers
-
Paper(Oral): https://arxiv.org/pdf/2011.14503.pdf
18. UP-DETR: Unsupervised Pre-training for Object Detection with Transformers
-
Paper(Oral): https://arxiv.org/pdf/2011.09094.pdf
-
中文解读:CVPR 2021 Oral | Transformer再发力!华南理工和微信提出UP-DETR:无监督预训练检测器
19. End-to-End Human Object Interaction Detection with HOI Transformer
20. Transformer Interpretability Beyond Attention Visualization
21. Line Segment Detection Using Transformers without Edges
-
Paper(Oral): https://arxiv.org/abs/2101.01909.pdf
-
Code: None
22. MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers
-
Code: None
23. SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation
-
Paper(Oral): https://arxiv.org/pdf/2101.08833.pdf
-
Code: https://github.com/dukebw/SSTVOS
24. Topological Planning With Transformers for Vision-and-Language Navigation
-
Code: None
25. Taming Transformers for High-Resolution Image Synthesis
-
Homepage: https://compvis.github.io/taming-transformers/
-
Paper(Oral): https://arxiv.org/pdf/2012.09841.pdf
26. Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos
-
Paper(Oral): https://hehefan.github.io/pdfs/p4transformer.pdf
-
Code: None
27. General Multi-Label Image Classification With Transformers
-
Code: None
28. Bottleneck Transformers for Visual Recognition
-
Code: None
29. VLN BERT: A Recurrent Vision-and-Language BERT for Navigation
-
Paper(Oral): https://arxiv.org/pdf/2011.13922.pdf
30. Less Is More: ClipBERT for Video-and-Language Learning via Sparse Sampling
-
Paper(Oral): https://arxiv.org/pdf/2102.06183.pdf
-
Code: https://github.com/jayleicn/ClipBERT
31. Scaling Local Self-Attention For Parameter Efficient Visual Backbones
-
Paper(Oral): https://arxiv.org/pdf/2103.12731.pdf
-
Code: None
下面是还没有公开的论文:
1. HR-NAS: Searching Efficient High-Resolution Neural Architectures with Transformers
Paper(Oral): None
Code: https://github.com/dingmyu/HR-NAS
2. Diverse Part Discovery: Occluded Person Re-Identification With Part-Aware Transformer
Paper: None
Code: None
3. LayoutTransformer: Scene Layout Generation With Conceptual and Spatial Diversity
Paper: None
Code: None
4. Facial Action Unit Detection With Transformers
Paper: None
Code: None
5. Clusformer: A Transformer Based Clustering Approach to Unsupervised Large-Scale Face and Visual Landmark Recognition
Paper: None
Code: None
6. Lesion-Aware Transformers for Diabetic Retinopathy Grading
Paper: None
Code: None
7. Adaptive Image Transformer for One-Shot Object Detection
Paper: None
Code: None
8. Multi-Stage Aggregated Transformer Network for Temporal Language Localization in Videos
Paper: None
Code: None
9. Self-Supervised Video Hashing via Bidirectional Transformers
Paper: None
Code: None
10. Gaussian Context Transformer
Paper: None
Code: None
11. Self-attention based Text Knowledge Mining for Text Detection
Paper: None
Code: https://github.com/CVI-SZU/STKM
12. SSAN: Separable Self-Attention Network for Video Representation Learning
Paper: None
Code: None
必读的 20 篇最新 ViT 论文
转自:https://mp.weixin.qq.com/s/CpmBY2qmvkxLiBmgy_PHJw