【论文笔记】Occupancy-MAE: Self-supervised Pre-training Large-scale LiDAR Point Clouds with Masked Occupan

Paper title: Occupancy-MAE: Self-supervised Pre-training Large-scale LiDAR Point Clouds with Masked Occupancy Autoencoders
Original link: https://arxiv.org/abs/2206.09900

This article proposes a self-supervised pre-training method for lidar point cloud perception, Occupancy-MAE, in view of the difficult and time-consuming annotation characteristics of lidar perception tasks. This method uses Masked Occupancy Autoencoders (MAE), which can be trained on large unlabeled data sets.

method

Insert image description here
After voxelizing the   lidar point cloud , use distance-aware random voxel discarding (since the farther the distance, the sparser the point cloud, the discarding probability needs to be set smaller) to discard some voxels, and pass the 3D sparse convolution encoder and 3D The deconvolution decoder predicts raw voxel occupancy (i.e. predicts a probability for each voxel grid, the closer it is to 1, the more likely it is that a lidar point will fall there). Using the raw lidar point cloud voxel occupancy as supervision, the loss is calculated for pre-training. In this way, the network can learn high-level semantic feature expressions.

The implementation of distance-aware random voxel discarding in this article is to divide the distance range into three parts: short range, medium range and long range, and set lower discarding probabilities in sequence.

  After pre-training, the 3D deconvolution decoder is discarded and replaced with the corresponding downstream task decoder.

application

  Occupancy-MAE can be used for 3D object detection, semantic segmentation and unsupervised domain adaptation tasks. For unsupervised domain adaptation tasks, you can first use point cloud data from both the source and target domains for pre-training (and then train downstream tasks on the source domain?).

Guess you like

Origin blog.csdn.net/weixin_45657478/article/details/132197567