3D point cloud (3D point cloud) and PointNet, PointNet++

1. What is 3D point cloud

https://www.youtube.com/watch?v=Ew24Rac8eYE
Traditional image data is 2D and
3D point cloud is 3D, which can express more information

  • For example, identification of violations and safety hazards in chemical plants
  • city ​​management

insert image description here
insert image description here
insert image description here

2. Some tasks based on 3D point cloud

  • Point Cloud Segmentation
  • Point cloud completion
  • point cloud generation
  • Point cloud object detection (3D object detection)
  • Point cloud registration (basis for subsequent tasks)
    insert image description here

insert image description here
Generally, point cloud data is generated based on lidar scanning.

3. How to extract the characteristics of 3D point cloud data: PointNet

The initial point cloud data only contains the coordinate information (x, y, z) of each point, which is far from enough for the follow-up tasks we want to complete. We also need to know the relationship between each point and the surrounding points or even each point. The relationship between a point and the whole world.

(1) Before PointNet, there was also work on deep learning on point clouds

Point cloud data is not defined on a regular grid, it can be distributed arbitrarily in space, and the number can be arbitrary, which is irregular data.
One solution:
insert image description here

  • Rasterization (voxelization) turns the irregular 3-point cloud into a regular data, evenly distributed in a 3-dimensional grid.
  • 3d convolution processing data processed into a 3D grid

Disadvantages of this approach:

  • If you choose to reduce the resolution of the picture in order to reduce the time complexity and space complexity, the learning effect will be greatly reduced;
  • Regardless of time and space complexity, input large-resolution images into 3D convolutional networks. In fact, the 3D point cloud only scans the surface information, so there is a large blank inside the grid, which causes a lot of waste of computing power resources.

(2)PointNet

So how to construct a backbone to extract these features of 3D point cloud?
insert image description here
insert image description here

Characteristics of point cloud data

(1) Permutation Invariance

Point cloud data is an inordered set, and the order in which each point appears does not affect the set itself.
insert image description here
N means that there are N points in the point cloud, and D means that each point has D-dimensional features. D can contain the coordinate information (x, y, z) of the point, and can also contain the color, normal vector...
Because the point set is unordered, there are a total of N! kinds of permutations and combinations. This requires our designed backbone to be invariant to all permutations.
insert image description here
insert image description here
Although the simplest max pooling and average pooling functions are permutation invariant, they will undoubtedly lose meaningful geometric information.
insert image description here

  • Because direct symmetry operations on point clouds will lose important geometric information;
  • We can first map the 3D point cloud to the high-dimensional space h, and at the same time generate a lot of redundant information;
  • At this time, if we perform the symmetric operation g in the high-dimensional space, we will still get sufficient geometric information, which will avoid the loss of too much geometric information.
  • Finally we go through another network γ \gammaγ to further digest this information and get a feature of the point cloud.

insert image description here
What's this? I don't quite understand.

(2) Transformation Invariance

For example, if a car is viewed from different angles, the coordinates of each point in the point cloud will change, but they all represent the same car.
How to design the network so that the network can adapt to this change in perspective?

Add the transformation function module based on the point cloud data itself:
insert image description here
insert image description here

The transformed parameters are generated through a neural network T-Net, and then the input data is transformed with the generated transformation function. This transformation function can automatically align our input, and the subsequent network can adapt to input data from different angles, and the processing task becomes easier.

The transformation operation on point cloud data is very simple, and only needs to perform a matrix multiplication .

Do some extensions: not only can transform the input data, but also transform the feature space of the intermediate features of the point.
For example, the feature of the point has been turned into K-dimensional at the beginning, so now there is a matrix of N * K. We can design a K * K matrix for matrix multiplication to transform the feature space of these intermediate features. This results in another set of features.

insert image description here

For these high-dimensional transformations involved in the above, optimization is more difficult, and some regularization may be required: for example, we hope that this matrix is ​​as close to an orthogonal matrix as possible.

Classification and Segmentation Networks

insert image description here

The features of a single point are combined with the global coordinates to realize the segmentation function.
The simplest is: repeat the global features n times, and connect each feature with the original single point feature, which is equivalent to a single point searching in the global feature, "Which position am I in the global feature, Which category do I probably belong to?"

insert image description here

Advantages of PointNet: small memory footprint and fast speed (efficient)

insert image description here

insert image description here

FLOPs: the amount of calculation, the abbreviation of floating point operations (s stands for plural), which means the number of floating point operations, understood as the amount of calculation. It can be used to measure the complexity of the algorithm/model.

Because of PointNet's high efficiency, it is very suitable for mobile devices and wearable devices.

Advantages of PointNet: very robust to data loss

insert image description here

It is very robust to data loss
why?
Explaining this can be understood through a visualization:

insert image description here
Maximum pooling allows the model to only focus on critical points, which also makes the model more robust to point loss.

4. PointNet++

Limitations of PointNet

insert image description here

In PointNet, first make a mapping from low-dimensional to high-dimensional for each point through MLP, and then combine the high-level features through MLP. So PointNet either operates on one point, or operates on the global features of all points. Therefore, PointNet lacks a local concept compared with 3D convolution, and it is more difficult to learn some fine features. In addition, there are certain defects in translation invariance. If a translation operation is performed on each point of the point cloud, the features learned by PointNet will be completely different.

The second generation network: PointNet++

  • Hierarcgical feature learning
  • Translation invariant
  • Permutation invariant

(1) Hierarcgical feature learning (multi-level feature learning)

insert image description here

insert image description here

insert image description here
It is also possible to restore the feature points extracted by pointnet through an " up-convolution " method.
insert image description here

In the multi-level feature learning network, how to choose the radius of the PointNet++ area ball ?

In CNN, it is becoming more and more popular to choose a very small kernel, such as a large number of 3*3 kernels in VGG.
So is this the case in PointNet++? In fact, this is not necessarily the case.

A very common situation in point cloud is that the sampling rate is very uneven. For example, in the point cloud sampled by a depth camera, the near points will be very dense, and the far points will be very sparse. Then the learning of PointNet++ will cause big problems in sparse places. For example, if there is only one point in our defined area of ​​action, the features learned in that area will be very unstable. This is something we would very much like to avoid.

In order to quantify this problem, we conducted a control variable experiment to verify the impact of point density on the accuracy of the PointNet++ network: small kernels will be greatly affected by uneven sampling rates.
insert image description here

In response to this problem, some new network structures can be designed to avoid this problem.

  • Muti-scale grouping(MSG)
  • Muti-res grouping(MRG)
  • The benefit of MRG is that it saves some computation. MSG needs to be calculated separately for different scales.
    insert image description here

Effect comparison:
insert image description here

insert image description here

5. Application of PointNet and PointNet++ in 3D scene understanding

  • 3D Object Detection
  • 3D Scene Flow
    insert image description here

3D Object Detection

insert image description here

Previous Work

  • 3D region + classification
    is based on the point cloud, make a region proposal in 3D space, then project it into the picture, draw a 3D box in the picture, and then perform object recognition; it can also be done with 3D CNN.
    insert image description here

shortcoming:

  • The amount of search in 3D space is very large, and the amount of calculation is also very large.
  • The resolution of the point cloud is limited, and it is difficult to find smaller objects.
  • RGB or RGB-D image based object detection
    insert image description here

Disadvantages:
RGB image: relying on prior knowledge of the size of the object, it is difficult to estimate the precise position of the object.
RGB-D image: Two points that are actually very far apart may be very close to each other when projected onto the image.
In the form of expression of pictures, the use of 2D CNN is still subject to great limitations.
It is difficult to accurately estimate the depth and size of objects

Frustum PointNets for 3D Object Detection

2D detectors for region proposal + 3D frustum + PointNets
overall idea:

  • Detect and delineate a preliminary region proposal on a two-dimensional image to reduce the cost of 3D space search
  • Generate the corresponding 3D frustum (Frustum) on the delineated region proposalasl, and use PointNets to perform deep learning on the point cloud within the frustum to obtain an accurate 3D proposal
    insert image description here

缺点:
occlusions and clutters are common in frustum point clouds
insert image description here

key to success

insert image description here

  • Normalization: Because it is point cloud data, normalization only needs to do matrix multiplication on the points.
    insert image description here

六、Q&A

  • Does PointNet not consider the relationship between points?
    PointNet++ considers the relationship between points
  • What is the full name of STN?
    special transformer network
  • How are key points selected?
    The key points are not selected, but our visualization, which visualizes which key points are selected by this network

Guess you like

Origin blog.csdn.net/verse_armour/article/details/128061016