CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)

最近,Visual Transformer 的研究热点达到了前所未有的高峰,仅 CVPR 2021 就发表了 40 多篇,应用涉及:图像分类、目标检测、实例分割、语义分割、行为识别、自动驾驶、关键点匹配、目标跟踪、NAS、low-level视觉、HoI、可解释性、布局生成、检索、文本检测等方向。

引爆CV圈 Transformer热潮的有两篇最具代表性论文,即 ECCV 2020的 DETR(目标检测)和 ICLR 2021的 ViT(图像分类)

目录

CVPR 2021 Visual Transformer 论文合集

必读的 20 篇必读 ViT 论文


CVPR 2021 Visual Transformer 论文合集

1. End-to-End Human Pose and Mesh Reconstruction with Transformers

2. Temporal-Relational CrossTransformers for Few-Shot Action Recognition

3. Kaleido-BERT:Vision-Language Pre-training on Fashion Domain

4. HOTR: End-to-End Human-Object Interaction Detection with Transformers

5. Multi-Modal Fusion Transformer for End-to-End Autonomous Driving

6. Pose Recognition with Cascade Transformers

7. Variational Transformer Networks for Layout Generation

8. LoFTR: Detector-Free Local Feature Matching with Transformers

中文解读:CVPR 2021 |  稀疏纹理也能匹配?速览基于Transformers的图像特征匹配器LoFTR

9. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

10. Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers

11. Transformer Tracking

12. MIST: Multiple Instance Spatial Transformer

13. Multimodal Motion Prediction with Stacked Transformers

14. Revamping cross-modal recipe retrieval with hierarchical Transformers and self-supervised learning

15. Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking

16. Pre-Trained Image Processing Transformer

17. End-to-End Video Instance Segmentation with Transformers

18. UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

19. End-to-End Human Object Interaction Detection with HOI Transformer

20. Transformer Interpretability Beyond Attention Visualization

21. Line Segment Detection Using Transformers without Edges

22. MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers

23. SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation

24. Topological Planning With Transformers for Vision-and-Language Navigation

25. Taming Transformers for High-Resolution Image Synthesis

26. Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos

  • Paper(Oral): https://hehefan.github.io/pdfs/p4transformer.pdf

  • Code: None

27. General Multi-Label Image Classification With Transformers

28. Bottleneck Transformers for Visual Recognition

29. VLN BERT: A Recurrent Vision-and-Language BERT for Navigation

30. Less Is More: ClipBERT for Video-and-Language Learning via Sparse Sampling

31. Scaling Local Self-Attention For Parameter Efficient Visual Backbones

下面是还没有公开的论文:
 

1. HR-NAS: Searching Efficient High-Resolution Neural Architectures with Transformers

Paper(Oral): None

Code: https://github.com/dingmyu/HR-NAS

2. Diverse Part Discovery: Occluded Person Re-Identification With Part-Aware Transformer

Paper: None

Code: None

3. LayoutTransformer: Scene Layout Generation With Conceptual and Spatial Diversity

Paper: None

Code: None

4. Facial Action Unit Detection With Transformers

Paper: None

Code: None

5. Clusformer: A Transformer Based Clustering Approach to Unsupervised Large-Scale Face and Visual Landmark Recognition

Paper: None

Code: None

6. Lesion-Aware Transformers for Diabetic Retinopathy Grading

Paper: None

Code: None

7. Adaptive Image Transformer for One-Shot Object Detection

Paper: None

Code: None

8. Multi-Stage Aggregated Transformer Network for Temporal Language Localization in Videos

Paper: None

Code: None

9. Self-Supervised Video Hashing via Bidirectional Transformers

Paper: None

Code: None

10. Gaussian Context Transformer

Paper: None

Code: None

11. Self-attention based Text Knowledge Mining for Text Detection

Paper: None

Code: https://github.com/CVI-SZU/STKM

12. SSAN: Separable Self-Attention Network for Video Representation Learning

Paper: None

Code: None

必读的 20 篇最新 ViT 论文

转自:https://mp.weixin.qq.com/s/CpmBY2qmvkxLiBmgy_PHJw

猜你喜欢

转载自blog.csdn.net/u014546828/article/details/117657912