3D visual perception new SOTA BEVFormer reproduces nuscenes dataset test demo

0 is written in front

Share your recent work on BEV perception, and welcome peers in autonomous driving to communicate and learn, so as to help autonomous driving come to fruition as soon as possible.

1 Overview

For autonomous driving, object detection under BEV (Bird's Eye View) is a very important task. Although this task has attracted considerable research effort, it remains a challenge to flexibly handle arbitrary camera configurations (single or multiple cameras) installed on autonomous vehicles.

To this end, BEVFormer is proposed, which utilizes the powerful feature extraction capabilities of Transformer and the query and mapping capabilities of the time series features of the Timestamp structure to aggregate the feature information of the two modalities in the time and space dimensions to enhance the detection effect of the overall perception system.

Paper link: https://arxiv.org/pdf/2203.17270v1.pdf

Code link: GitHub - zhiqi-li/BEVFormer

About BEVFormer

BEVFormer interacts time and space through predefined grid-like BEV queries to mine space and time information. To aggregate spatial information, we design a spatial cross-attention where each BEV query extracts spatial features from regions of interest in the camera view. For temporal information, we propose a temporal self-attention to repeatedly fuse historical BEV information. On the nuScenes dataset, the NDS evaluation value indicator reaches SOTA: 56.9%, which is 9 points higher than the previous SOTA method based on lidar. We further show that BEVFormer significantly improves the accuracy of target velocity estimation and recall in low-visibility conditions.

 figure 1

2. Structural framework

 figure 2

The encoding layer of BEVFormer consists of grid-like BEV query, temporal self-attention and spatial cross-attention.

In spatial cross-attention, each BEV query only interacts with the image features of the region of interest.

In temporal self-attention, each BEV query interacts with two features: the BEV query at the current timestamp and the BEV feature at the previous timestamp.

3. Refer to the source code for the configuration environment in detail. I will not elaborate here one by one. Here I will share the problems and solutions in my configuration process.

  • Error: No module named 'tools' Analysis: The absolute path is not recognized 
  • Solution: export PYTHONPATH=${PYTHONPATH}:/home/mnt/mmdetection3d/BEVFormer/tools
  • source ~/.profile

Execute in the terminal: python tools/create_data.py nuscenes --root-path ./data/nuscenes --out-dir ./data/nuscenes --extra-tag nuscenes --version v1.0 --canbus ./data

If the result shown in the figure below is generated, it indicates that the data processing is correct

4. Experimental results, demo display

nuScense contains 1000 samples, each about 20s of data, labeled 2Hz, each sample contains 6 cameras with a 360-degree horizontal scene. For the target detection task, 1.4M 3D boxes are marked, including a total of 10 categories. 5 evaluation criteria: ATE, ASE, AEO, AVE, AAE, in addition, nuScense also proposed NDS to calculate the comprehensive score.

BEV features can be used for 3D object detection and map semantic segmentation tasks. Commonly used 2D detection networks can be migrated to 3D detection with minor modifications. Experiments verify that the same BEV features are used to simultaneously support 3D object detection and map semantic segmentation. Experiments show that multi-task learning can improve the effect on 3D detection.

Video demo of continuous frames:

3D Visual Perception New SOTA BEVFormer Reproduced Nuscenes Dataset Test Demo_哔哩哔哩_bilibili 3D Visual Perception New SOTA BEVFormer Reproduces Nuscenes Dataset Test Demo, Video Views 1, Barrage 0, Likes 0, Votes The number of coins is 0, the number of favorites is 0, and the number of reposts is 0. The author of the video, Xiao Zhang, is a CV. The author is a code farmer who works on autonomous driving perception, and a unicorn intern. Related videos: Lecturer! Interpretation and study guidance of the syllabus of "Deep Learning Practical Combat", from 0 basics to the level of thesis publication, employment & completion & competition in one step! ,Finally found! This is definitely the most detailed (there is no one) full set of videos from getting started with OpenCV to mastering OpenCV in all station B, with a full 150 episodes (recommended to watch slowly), the most complete! Transformer's latest collection of 100 top conference papers, Semantic SLAM (ORBSLAM2+FCAF3D), shocking! AI automatic program writing, CLRNet training test demo in Tusimple data set for CVPR2022 lane line detection SOTA work, helping automatic driving sooner or later, [YOLO target detection] is worthy of being a professor of Tsinghua University, YOLOv7 that the instructor has not let me understand for three years in 3 hours /v6/v5/v4/v3/v2/v1 makes it clear! It just opened my eyes! , ChatGPT VScode plug-in has been launched. , ECCV2022 SimpleRecon, a high-quality 3D reconstruction solution without 3D convolution, finally found it! This is definitely the most detailed (no one) in all station B. A full set of videos from getting started to mastering OpenCV, a full set of 130 episodes (recommended to watch slowly) https://www.bilibili.com/video/BV16P411K7rp/

Written at the end: Due to my limited facilities, the training data has been reduced. It is recommended that you train on 8 GPUs

Answer: From the perspective of visual algorithms, identifying whether an object exists is more of a semantic issue. This process depends on training data, and there are bound to be errors such as missed detection and false detection. It is more reliable to recognize the presence of objects on a physical level through devices such as LiDAR. In addition, traditional problems in vision algorithms such as multi-scale and small object detection will also restrict the performance of the system.

You can ask about the specific process through the bilibili comment area, and I will give you answers in the comment area. More high-quality information can be shared through my CSDN princess account, and you can leave a message after paying attention

Guess you like

Origin blog.csdn.net/weixin_64043217/article/details/128263870