One of the laser point cloud series: explain the processing process of laser radar point cloud data in detail

Exchange group |  Enter "sensor group/skateboard chassis group", please add WeChat ID: xsh041388

Exchange group |  Enter the "Automotive Basic Software Group", please add WeChat account: Faye_chloe

Remarks: group name  + real name, company, position


Author | Xi Shaohua

With the increasing number of LiDAR on the car, how to make good use of LiDAR has become the top priority, and one of the key points to make good use of LiDAR is to process point cloud data well.

Laser point cloud refers to the data set of spatial points scanned by 3D laser radar equipment. Each point cloud contains 3D coordinates (XYZ) and laser reflection intensity (Intensity). The intensity information will be related to the surface material and The roughness, laser incident angle, laser wavelength and energy density of laser radar are related.

In order to further explain the point cloud clearly, the author sorts out the relevant parameters and characteristics of the point cloud.

Table: Related parameters of lidar point cloud

6e5b201a89050a4e2caec8c189ec6439.png

Table: Characteristics of Point Clouds

eaf5226f9b88fd7ac992e17cf799d5fe.png

From the point of view of the collection principle of point cloud, taking the laser radar of the ToF route as an example, the laser point cloud is obtained by the vehicle-mounted laser scanning system emitting laser signals to the surroundings, and then collecting the reflected laser signals. The distance information of the target is measured by information such as the time from launch to return, and then combined with IMU, odometer, GNSS and other information to calculate the three-dimensional coordinate information and distance information of the target in front.

In addition, in the process of point cloud collection and analysis, perception algorithm personnel need to make better use of point clouds based on the characteristics and working principles of vehicle-mounted lidar, combined with related parameters such as the angular resolution and field of view of lidar.

So, what are the specific processing procedures and methods in the processing of the lidar point cloud in the car? And how to optimize?

Afterwards, in order to verify these issues, the author successively interviewed Dr. Xu Jian, the person in charge of the Tudatong algorithm, Leon, the person in charge of the system and application, Yin Wei, the senior manager of SAIC, Tang Qiang, the perception algorithm engineer of Zongmu Technology, and experts from Hesai Technology. Here, the author is grateful for the support given by experts from all walks of life, and sorts out the following series of articles for readers to refer to.

The following is the first article in a series of articles on laser point cloud processing.

The specific process of lidar point cloud processing

After talking about what is a point cloud and its characteristics, the following will continue to analyze the processing flow of the lidar point cloud in automatic driving. Before that, there are two points that need to be explained.

First, when the lidar point cloud is used for perception and positioning, the point cloud will first complete the preprocessing, and then perform different processing according to different purposes, and the application details of the point cloud will also be different.

Tang Qiang said: "If lidar is used for positioning, the algorithm model will require as much point cloud data as possible, and different from perception, positioning also requires road data. Such as lane lines or landmarks, their reflectivity for lasers is different from ordinary The reflectivity of the road surface is different, and the laser radar obtains the data of the road surface information in this way. If the laser radar is used for perception, it will have a preprocessing process, which will determine a ROI range, and then use the points in this area Cloud data to complete follow-up detection."

Second, in the specific application process of point cloud, each enterprise will have some differences in the processing process according to their respective technical solutions and application scenarios, but most of the processing processes are the same.

In order to explain the processing flow of point cloud in more detail, the author combines the content of expert interviews and public information to sort out the general process of lidar point cloud in automatic driving applications.

d2216f1f6544b537fc0738f4707e7fa5.png

Figure: Processing flow of laser point cloud

01

Point cloud preprocessing level

1.1  Reception and analysis of original point cloud data

(1) Receiving point cloud data

The original point cloud data of lidar will be stored in a data packet (pcap). At this time, the data in the data packet is a series of byte symbols and cannot be used directly.

Taking Velodyne's 16-line lidar as an example, the original point cloud data is received mainly by sending data to the network in the form of UDP (User Datagram Protocol). Specifically, after setting on the web side of the lidar or through the command line, the technician will match the IP address of the lidar with its own UDP port number on the receiving end, so that the original point cloud data can be received.

Judging from the content of the data, this type of lidar has 16 lines of laser beams in the vertical direction (-15° to +15°), and the data length of each frame is fixed at 1248 bytes, these bytes include the first 42 The first data packet identification of bytes, 12 groups of data packets, 4 bytes of time stamp and the last two bytes of radar model parameters.

0c706cf8e115f764090660465311789d.png

 Figure: Point cloud data of each frame of Velodyne-16 lidar

(Data source: "Velodyne VLP-16 lidar data format analysis")

The data in each data packet contains information about the rotation angle, distance value, and reflection intensity of the laser beam. For example, "B6 07" represents the detection distance of the lidar, and "2A" represents the reflection intensity of the laser, but these information are expressed in two bytes, and further analysis of these data is required.

a61dabb5fcc21df92af28d79ca371588.png

 Figure: Part of the Velodyne-16 lidar data packet

(Data source: "Velodyne VLP-16 lidar data format analysis")

(2) Analysis of point cloud data (pcd)

The raw data in the data packet (pcap) needs to be further converted into a dataset in pcd format that can be used by perception technicians.

The pcd format file of point cloud data is a storage format of laser point cloud. The pcd file is mainly a list composed of Cartesian coordinates (x, y, z) and intensity value i, that is, each point cloud will have a unique 3D coordinate system and energy reflection intensity.

In this coordinate system, the x-axis points to the front of the car and the y-axis points to the left side of the car. Since this coordinate system uses the right-hand rule, the z-axis of the coordinate system points up the car.

6139136c287fb2bfcfb7500b355607de.png

Figure: point cloud pcd coordinates

(Data source: "Study Notes: Introduction to Point Cloud Library PCL")

In order to explain the analysis process of point cloud data, the author still takes a certain frame of point cloud data package of the Velodyne-16 lidar above as an example, and sorts it out as follows based on public information.

The first step is to calculate the rotation angle value of the laser line.

For example, the rotation angle of the first line of the first part of the packet above is 0xE0, 0x63.

a) Reverse two bytes to become hexadecimal 63 E0

b) Convert 63 E0 to unsigned decimal as 25568

c) Divide 25568 by 100 to get 255.68, then the obtained value 255.68 is the current rotation angle value

The second step is to calculate the distances measured by the 16 laser lines respectively.

For example, the distance of the first laser line in the first part of the data packet above has a value of "B6 07 2A", where "B6 07" is the distance and "2A" is the reflection intensity.

a) Reverse the two distance bytes "B6 07", turning it into "07 B6"

b) Change "07 B6" to unsigned decimal as 1974

c) Since the resolution of this type of lidar is 2.0mm, the target distance measured by the laser beam is 1974*2=3948mm

d) Convert 3948mm into the unit of meter measurement, that is, 3.948m

The third step is to obtain the time stamp of the frame and the parameters of the lidar model.

For example, the last six bytes of data "6D 69 94 0F 37 22" in the last part of the data packet in the above figure.

a) The first four bytes of data "6D 69 94 0F" are the timestamp of the frame, and then reverse the sequence "0F 94 69 6D"

b) Convert "0F 94 69 6D" to a decimal value of 261384557 microseconds (μs)

c) Divide 261384557 by 1000000 to get the current time, which is 261.384557 seconds (s)

d) The last two bytes "37 22" indicate the model and parameters of the radar

The fourth step is to convert the angle and distance information into three-dimensional coordinate XYZ values.

The three-dimensional coordinate XYZ value can be comprehensively calculated by the rotation angle α (already obtained in the first step), the vertical angle ω (the fixed value corresponding to each laser line) and the distance value R (already obtained in the second step) The specific coordinate conversion is shown in the figure below.

2570c111f0b90402a85f346f3d8ec3f9.png

 Figure: 3D coordinate calculation of target point cloud

(Data source: "Velodyne VLP-16 lidar data format analysis")

1.2  Motion Distortion Compensation

Motion distortion refers to the problem that the position of the point cloud generated by the lidar or its carrier is not the same within a frame time.

To understand motion distortion compensation, we first need to know why the lidar point cloud on the self-driving car side produces motion distortion.

In fact, a frame of laser point cloud emitted by lidar will consist of multiple laser points, and these laser point clouds are formed after one scan of the scanning device. When in a static scene, the vehicle is stationary and the target objects in the scene are also relatively stationary, then a frame of point cloud collected is not distorted, and each laser beam will eventually form a closed circle.

affac27c1795899a961b1293108825e4.png

 Figure: The point cloud of the vehicle in a static state does not produce distortion

In a moving scene, such as when the vehicle is driving at high speed or turning, the starting point cloud and ending point cloud in a frame point cloud can only obtain measurement results in different coordinate systems, which leads to distortion of the 3D environment information. As shown in the figure below, when the vehicle is in motion, the laser radar on the vehicle end scans a circle, and when the last laser beam hits the target, compared with when the first laser beam hits the target, the target The spatial position of the object has undergone a relative displacement—the point cloud of the object at two different moments, the information displayed in the coordinate system is different.

33db5ec1c4d80b8ab93594d422afc336.png Figure: The situation where the target is irradiated by the laser at different times

Motion distortion compensation is to solve the above problems - calculate the motion trajectory of the laser during the acquisition process, and compensate the changes caused by this part of the motion displacement on the corresponding laser point cloud, and unify the point cloud of the same frame to the same moment under the coordinate system.

In order to further explain the motion distortion compensation, the perception algorithm engineer of an OEM gave an example: "For example, if the car is equipped with an IMU or a wheel odometer, the perception algorithm personnel can use the IMU and the wheel odometer (or directly use the IMU). The method is to calculate how the car moves within 0.1 seconds, and then use the motion model to compensate for motion distortion."

Common motion distortion compensation methods:

  • Pure estimation methods (ICP/VICP)

The iterative closest point (ICP) method uses the ICP algorithm to match two point clouds, and minimizes the error between point clouds through continuous algorithm iterations.

The VICP method is a variant of the ICP algorithm. The model assumes that the vehicle is moving at a constant speed, and estimates the vehicle's own speed while matching the point cloud.

Iterative closest point (ICP) and VICP are collectively referred to as "pure estimation methods".

  • Sensor Assisted Method (IMU/ODOM)

The inertial measurement unit (IMU) method is to search for the data of two adjacent frames of IMU in the IMU queue, and then calculate the lidar pose at the moment of the scanning point by means of spherical linear interpolation, and apply the change of the homogeneous coordinate system to convert the two points The cloud coordinates are transformed to the same coordinate system.

The wheel odometry (ODOM) method is to convert each point cloud coordinates to the same In the coordinate system (it needs to be converted twice), and finally repackage the point cloud data of the frame.

Inertial Measurement Units (IMUs) and Wheel-On-Board Odometers (ODOMs) are collectively referred to as sensor-assisted methods.

  • fusion method

This method is a fusion scheme that uses the odometer and ICP at the same time. It will first use the odometer method to correct, remove most of the motion distortion, and then use the ICP method to match to get the error value of the odometer, and then spread the error value to each on the point cloud, and correct the position of the point cloud again. Finally, the ICP method is used to iterate until the error converges.

1.3 Point cloud framing

After the perception algorithm personnel complete the motion distortion compensation of the point cloud, they will find a problem: the number of point clouds in the point cloud data packet sent by the lidar is actually very small, and it cannot be used for subsequent processing at the perception and positioning levels.

At this time, the perception algorithm personnel need to process the point cloud framing of these data packets.

Regarding point cloud framing, the perception algorithm engineer of a L4 solution provider said: "Taking a single lidar as an example, the perception algorithm personnel will superimpose multiple point cloud data packets on the same frame, so that the point cloud on this frame The data can contain tens of thousands of point clouds for subsequent processing of perception and positioning processes. If there are multiple lidars on the vehicle side, the perception algorithm personnel will analyze the point cloud data packets of these lidars separately, and then convert the analyzed points The cloud data is collected at the same moment to make it into a large data package. For example, technicians will gather the point clouds of multiple lidars at the vehicle end at time t and process them as one frame of point cloud data. "

1.4 Changes in external parameters

The point cloud coordinate system obtained by analyzing the point cloud data belongs to the lidar coordinate system, but in the actual application of automatic driving technology, it is still necessary to convert the coordinate system of the lidar into the coordinate system of the vehicle. This process of establishing a connection is called The extrinsic parameters of the point cloud change.

Since the lidar and the car body are rigidly connected, the relative attitude and displacement between the two are fixed during the movement of the vehicle. It is only necessary to establish the positional relationship between the two relative coordinate systems, through rotation or translation In this way, the two three-dimensional coordinate systems can be combined into one three-dimensional coordinate system (also known as the global coordinate system or the world coordinate system).

1.5 Filter processing

In the process of obtaining point cloud data by lidar, due to the influence of factors such as the product's own system, the surface of the object to be measured, and the scanning environment, the point cloud data will inevitably contain some noise points (outliers), which need to be Cull directly or process in a smoothed way. These noise points (outliers) will cause certain errors in the model results in the subsequent point cloud processing (such as point cloud segmentation, feature extraction, point cloud registration, etc.). Therefore, in the actual point cloud processing process, the perception personnel will filter the point cloud.

  • Noise refers to point cloud data that is useless for model processing.

  • Outliers refer to point cloud data far away from the subjective observation area.

In interviews with experts, the author learned that in the autonomous driving industry, noise generally includes outliers. Therefore, in subsequent articles, the author will collectively refer to noise.

The following table lists the common filtering algorithms in the autonomous driving industry sorted out by the author.

Table: Common filtering algorithms in the autonomous driving industry

714f77ff8308bac901dc96fbc5e3aab4.png

02

Perceptual Function Level Processing

After the entire point cloud preprocessing work is completed, the perception algorithm personnel will process the point cloud data at the perception and positioning levels respectively.

In the process of perception level, point cloud data is mainly used for 3D target detection, that is, the automatic driving system needs to identify and detect obstacles in the vehicle perception area, so as to take measures such as obstacle avoidance.

After the point cloud preprocessing is completed, the processing at the perception level will have two branches: one is to apply traditional 3D object detection methods, including point cloud segmentation, cluster analysis of objects, etc.; the other is to directly apply deep learning models To complete 3D target detection.

The following will disassemble the point cloud 3D target detection process based on two different methods.

2.1 Perceptual data processing based on traditional methods

(1) Ground point cloud segmentation

After target detection, a large part of the point cloud data belongs to the ground point data, and presents a certain texture, which will affect the subsequent point cloud processing process of the target object.

On the one hand, if these ground point cloud data are not segmented and removed, these invalid point cloud data will interfere with the object point cloud data on the ground, which will reduce the accuracy and robustness of the target obstacle segmentation algorithm ; On the other hand, due to the large amount of point cloud data, this will increase the computational requirements of the model.

Therefore, before performing follow-up work, the perception algorithm personnel need to filter the ground point cloud first.

Since the convolutional neural network model generally extracts features for each local area in the form of a sliding window, and then performs classification and regression, the deep learning method often does not need to pre-segment the ground point cloud. In the field of autonomous driving, considering factors such as hardware performance occupancy, development cycle, and model maturity, perception algorithm personnel generally use traditional algorithms to segment ground point clouds.

The author sorts out several commonly used ground point segmentation methods:

  • planar grid method

The main idea: the planar grid method usually establishes a planar grid (multi-layer grid or 3D voxel can also be used) according to the set size, and then projects the original point cloud into its own grid. Extract features from the point cloud collection in the grid, such as average height, maximum height, height difference, density, and so on.

Technical highlights: Vector features are not considered, making subsequent planning control easier to implement.

Existing problems: When the laser radar harness is relatively small, for example, when the 16-line laser radar is collecting road data, there are relatively few laser points that can be hit on the ground 20 meters in front of the vehicle, and it hits obstacles. Generally, there is only one laser beam. If the height feature is used for ground filtering in the grid, low obstacles will be easily filtered out as ground points.

  •  Point cloud normal vector

Main ideas: The normal vector of the point cloud refers to the point cloud of the ground segmented by setting the angle threshold of the point cloud. Generally, the normal vector of the ground point cloud is in the vertical direction. It is only necessary to solve the relationship between the point cloud normal vector and the ground normal vector through the model. The included angle is compared and classified with the set threshold. This method requires the support of its neighbor points, and the size of the neighborhood is generally represented by the radius value of the neighborhood or the number of adjacent points. A neighborhood that is too large will flatten the details of the three-dimensional structure and make the normal vector too rough, while a neighborhood that is too small will be strongly disturbed by noise because it contains too few points.

Technical highlights: This method can better extract the point cloud collection of sudden changes in the normal vectors on both sides of the road to form the roadside, so as to divide the road area, non-road area, and obstacles with rasterization.

Existing problems:

1) According to the assumption of the normal vector method, the method must first correct the point cloud. If no correction is performed, then there may be an extreme situation where no ground point is segmented in a certain frame (the lidar tilt angle is too large).

2) The normal vector method cannot effectively distinguish points generated by platform-type obstacles (such as rectangular flower beds on the side of the road).

  • Model Fitting Method - Plane Fitting (RANSAC)

Main idea: RANSAC plane fitting refers to establishing a plane equation by randomly selecting three point clouds, and substituting the point cloud data into the plane equation in turn, and then judging according to the set distance threshold, whether the point is in the plane point. For example, points within the threshold are inliers, and points outside the threshold are outliers. The plane equation with the most iterations is the ground equation, and the internal points in the equation are the ground point set, otherwise it is the point cloud set of obstacles.

Technical highlights: When there is a large amount of abnormal data in the data, this method can also estimate the model parameters with high precision - it can more easily estimate the ground point cloud set from large-scale point cloud data.

Existing problems:

1) Considering drainage factors, traffic roads are usually raised in the middle and low-lying on both sides, similar to the shape of an arch bridge. Although the curvature is not large, the calculation of the ground plane by the consensus algorithm of random sampling may result in a plane inclined to one side as the ground equation.

2) When going up and downhill, since the ground is not an absolute plane, the ground equation calculated by this method will take the front ground point set as the obstacle point.

3) Since RANSAC randomly takes three points in the point cloud space to build a plane, if there is a large wall in the scene, the wall will be used as the ground equation.

  • Surfing method

The main idea: Surface element-based segmentation can be divided into local types or surface types, and the method of region growth is often used for ground segmentation. Its core is based on the comparison of the angles between the normals of the points, and the adjacent points that satisfy the smoothness constraint are merged together and output in the form of a cluster of point sets, and each cluster of point sets is considered to belong to the same plane.

Technical highlights: This method can better deal with the curvature of the ground, and can achieve better segmentation results for relatively gentle curved surfaces or planes.

Existing problems:

1) There are too many noise points in the actual road, and the ground is segmented directly by region growth, and more sporadic ground points will be regarded as the collection point cloud of obstacle points.

2) The time consumption of the region growth algorithm is relatively large, and further optimization is required for the perception algorithm module with high real-time requirements. For example, reduce the area growth of the plane to the edge, or divide the area, segment in a small area, and so on.

(2) Point cloud segmentation of the target object

After removing the ground point cloud, the perception algorithm personnel need to effectively segment and block the target point cloud, so as to facilitate the separate processing of the target object, that is, point cloud segmentation. The point cloud segmentation of the target obstacle is to divide the point cloud according to the characteristics of space, geometry and texture.

The author sorts out several commonly used point cloud segmentation methods:

  • edge-based approach

Main idea: The edge-based method is described by the shape of the object, especially the edge. Therefore, by locating the points where the point cloud of the edge of the object changes rapidly, the points close to the edge area of ​​the object are found and segmented.

Technical highlights: This method adopts the algorithm optimization mechanism of reconfigurable multi-ring network, which improves the efficiency of algorithm operation.

Existing problems:

1) This method is more suitable for simple scenarios (such as low noise, uniform density), and is not suitable for datasets with a large number of 3D point clouds.

2) Faced with the point cloud data of the target object with discontinuous edges, if the point cloud filling is not used, it cannot be directly used for recognition and detection.

  • Region Growth Based Approach

Main ideas: The method based on region growth refers to the use of neighborhood information to classify nearby points with similar attributes to obtain segmented regions and distinguish the differences between different regions. The methods are mainly divided into two categories: seeded region methods and non-seed region methods. Among them, the seed area method is to start segmentation by selecting multiple seed points, using these seed points as the starting point, and gradually forming a point cloud area by adding the neighborhood points of the seeds; the non-seed area method is to divide all points into into a region and then divide it into smaller regions.

Technical highlights: Compared with the edge method, the segmentation accuracy of this method will be higher.

Existing problems: The method depends on the selected starting seed point or area subdivision position. If the selection is not appropriate, it will lead to problems such as excessive or insufficient segmentation.

  • attribute-based approach

The main idea: the attribute-based method is to first calculate the attributes of the target point cloud, such as distance, density, point cloud distribution in the horizontal or vertical direction, etc., and use this to define the field between the measurement points, and then divide each direction The difference between the slope of the normal vector and the data of the point neighborhood is used as a clustering attribute.

Technical highlights: Compared with the previous two methods, the attribute-based method can eliminate the influence of outliers and noise.

Existing problems: This method relies on the definition of the neighborhood between points and the point density of point cloud data. When dealing with multi-dimensional attributes of a large number of input points, it will lead to excessive calculation requirements for the model.

(3) Target cluster analysis

After the target point cloud is segmented, the perception algorithm personnel need to cluster each segmented point cloud in the point cloud image into several wholes, that is, to form a group of point clouds with a high degree of similarity, so as to reduce the cost of subsequent models. Computational amount - This process is called point cloud clustering.

Common point cloud clustering methods are as follows:

  • K-means

Main idea: K-means clustering algorithm refers to dividing the entire point cloud data set into k point cloud clusters with certain unified characteristics. First, k points are randomly selected from each point cloud cluster as the center point of the point cloud cluster. Then, calculate the actual distance between each point cloud cluster and the above k points for each point cloud cluster, and cluster them into the point cloud cluster according to the principle of the smallest distance value. Then calculate the centroid coordinates for the clustered point cloud clusters, and update the center point of the point cloud clusters. Finally, the model will repeat the above steps until the center point of the point cloud cluster no longer changes.

Technical highlights: high accuracy and determinacy, can handle a large amount of data, and fast calculation speed.

Existing problems: This method needs to pre-set the K value and the initial cluster center, and the real-time performance is poor.

  • DBSCAN

The main idea: DBSCAN introduces the concept of density, which requires that the data volume of objects contained in a certain area in the clustering space is not less than a given threshold. This method can find clusters of any shape in a noisy spatial database, can connect adjacent areas with sufficient density, can effectively deal with abnormal data, and is mainly used for clustering spatial data.

Technical highlights:

1) Can cluster point clouds of any shape

2) Can effectively remove noise points

Existing problems:

1) Large consumption of memory resources

2) High requirements on the processor

3) It is necessary to pre-set the radius of the clustering area and the trigger threshold

  • European clustering

Main ideas: Euclidean clustering (also known as Euclidean clustering) refers to a method based on Euclidean distance clustering. In the point cloud data of lidar, the distance between two points in the point cloud cluster of the same object is less than a certain value, and the distance between point cloud clusters between different objects is greater than a certain value. The Euclidean clustering algorithm is based on this principle, and the points whose Euclidean distance is less than the set distance threshold are merged into one class to complete the clustering process.

Technical highlights: This method is fast and has good versatility.

Existing problems: This method needs to preset a fixed distance threshold, which will lead to better clustering effect of nearby objects, while distant clustering will have problems of under-segmentation or truncation.

(4) Matching and tracking

After completing the previous part, the perception algorithm personnel can basically know what target objects these point clouds represent from the processed data, and the next thing to do is to match and track the target objects, that is, to predict In which area will the target appear at the next moment. In obstacle detection, the accuracy of matching is the basis for subsequent multi-sensor fusion.

Generally speaking, the algorithm flow of matching and tracking is to first calculate the correlation matrix between the target prediction result and the measured point cloud data, and then use the Hungarian algorithm (the core principle of which is to find the augmented path to achieve the maximum matching) to match the relationship OK, and finally divide the point cloud data into two types: matched target and unmatched target, save them separately, and prepare for tracking.

2.2 Perceptual data processing based on deep learning

In the field of autonomous driving, with the increasing amount of point cloud data, traditional target detection algorithms can no longer meet the actual needs. After talking with various experts, the author learned that the current point cloud 3D target detection mainly uses the deep learning model.

A perception algorithm engineer of an OEM said: "In object detection at the perception level, the point cloud is directly put into the deep learning model after the preprocessing is completed, or it is down-sampled first and then put into the deep learning model."

Commonly used deep learning-based target detection methods:

  • PointNet

The main idea: PointNet first calculates features for each point in the point cloud, and then combines these features through an operation that has nothing to do with the order of the point cloud to obtain features belonging to the entire point cloud, which can be directly used for task recognition.

Technical highlights:

1) Directly feed point cloud data into the network instead of normalizing it;

2) Utilization of rotation invariance and permutation invariance.

√Rotation invariance: All points do the same transformation (rotation and translation), which does not affect the expression of the shape.

√Substitution invariance: Arbitrarily exchange the position of each point without affecting the expression of the shape.

Existing problems: local features cannot be obtained, which makes it difficult for the PointNet method to analyze complex scenes.

ae3e7a5cc59a5d3f0c905606dd7b8059.png

 Figure: Network structure of PointNet

  • PointNet++

Main idea: PointNet++ is derived based on the PointNet method, which mainly draws on the idea of ​​CNN's multi-layer receptive field. CNN continuously uses the convolution kernel to scan the pixels on the image and do inner product by layering, so that the feature map has a larger receptive field, and each pixel contains more information. PointNet++ imitates such a structure. First, by sampling the entire point cloud and drawing a range, the points inside are used as local features, and PointNet is used for feature extraction.

Technical highlights:

1) There is no information loss caused by quantization, and there is no need to adjust quantization hyperparameters.

2) Ignore the blank area, avoiding invalid calculations.

Existing problems:

1) Unable to take advantage of mature spatial convolution-based 2D object detection algorithms.

2) Although invalid calculations are avoided, the processing efficiency of GPU for point clouds is much lower than that for grid data.

cf9074a692b6a7e08eee63f425fd3d17.png

 Figure: Comparison of the segmentation effect of PointNet (left part) and PointNet++ (middle part)

6f2b1c70afda3fc784d1976c2cf2dbba.png

 Figure: Network structure of PointNet++

  • VoxelNet

Main ideas: VoxelNet mainly converts 3D point clouds into voxel structures, and then processes this structure in the form of a bird's-eye view. The voxel structure here is to use cubes of the same size to divide the three-dimensional space, and each cube is called a voxel (voxel).

VoxelNet has two main processes, the first is called VFE (Voxel Feature Extraction) is the feature extraction process of voxel, and the second is a target detection process similar to YOLO.

Technical highlights:

1) Task detection can be performed directly on sparse point cloud data, and the information bottleneck caused by artificial feature engineering can be avoided.

2) It can make more effective use of the parallel computing advantages of the GPU.

Existing problems: VoxelNet is relatively inefficient for data representation (a new data structure reconstructed to adapt to model operations), and the 3D convolution of the middle layer requires too much calculation, resulting in a running speed of only about 2FPS (Frame Per Second), far below the real-time requirements.

c164e275ee400c45e6b4d60efba81d63.png Figure: Network structure of VoxelNet

  • SECOND

Main idea: SECOND is a point cloud detection method optimized based on the VoxelNet method. The overall structure and implementation of the network are mostly similar to the original VoxelNet. At the same time, the 3D convolution of the middle layer is improved on the basis of VoxelNet. Sparse Convolution is done, which improves the efficiency of training and the speed of network reasoning. At the same time, SECOND also proposes a new loss function and point cloud data enhancement strategy. The SECOND network structure is mainly composed of three parts: VFE feature extraction stage, sparse convolutional layer, and RPN network.

Technical Highlights: The inference speed of the model is improved by using sparse convolution.

Problems: Although SECOND has improved speed compared to VoxelNet, it still retains 3D convolution.

8499c9807291e3cbea591e68d95b8f66.png Figure: SECOND network structure

  • PointPillar

The main idea: PointPillar directly stacks the points that fall into each grid, which is called a pillar (Pillar), and then uses a method similar to PointNet to learn features, and finally maps the learned feature vectors Back to the grid coordinates to get data similar to the image.

Technical highlights:

1) By learning features instead of relying on a fixed encoder, PointPillars can exploit the full information of point cloud representations.

2) By operating on columns instead of voxels, there is no need to manually adjust binning in the vertical direction.

3) Only 2D convolution is used in the network, and 3D convolution is not used. The demand for calculation is small and the operation is efficient.

4) Different point cloud configurations can be used without manual adjustments.

Existing problems: the learning of point features is limited to the grid, and the information of adjacent areas cannot be effectively extracted.

426d785571dafd38f4c816f69b67c0ec.png Figure: Network structure of PointPillar

  • PCT

The main idea: PCT mainly uses the inherent order invariance of Transformer, avoids defining the order of point cloud data, and performs feature learning through the attention mechanism. The overall network structure is divided into three parts: input embedding, attention layer, and point cloud classification and segmentation.

Technical highlights:

1) PCT has inherent permutation invariance and is more suitable for point cloud learning.

2) Compared with the mainstream PointNet network, the segmentation edge of PCT is clearer.

8a77bbdfc3b596ad83f852058b10bf6d.png 

Figure: Comparison of the segmentation effect of PointNet (left part) and PCT (middle part)

Existing problems: PCT is an effective global feature extraction network, but it ignores the equally important local neighborhood information in point cloud deep learning.

733e573d5f232d98114ca9e6c5745345.png Figure: PCT network structure

Although deep learning has been widely used in the autonomous driving industry, deep learning will also encounter some challenges in point cloud data processing.

On the one hand, point clouds are sparse and unstructured in nature as the locations of points in the scene, so both their density and number vary with the objects in the scene. On the other hand, since autonomous vehicles need to react very quickly while driving, object detection must be performed in real time, which means that the detection network must provide calculations in the time interval between scans.

Therefore, although deep learning is available and easy to use, it cannot be fully utilized.

Xu Jian said: "At present, AI algorithms such as deep learning cannot achieve 100% accurate recognition and detection, which may easily lead to missed detection of targets. AI algorithms are a very important means of 3D point cloud perception, but AI cannot be relied on alone. Through comprehensive The application of AI algorithms and traditional algorithms can solve the problem of incompleteness of the data sample space, thereby avoiding missed detection of the target."

03

Positioning function level processing

3.1 Feature Extraction

When a self-driving vehicle is driving on the road, the vehicle does not actually know where it is, so the first step in using point cloud data for positioning is to let the self-driving vehicle know "where I am".

At this time, the perception algorithm personnel need to extract the target features of the surrounding scene first, and use these features and the obtained relative distance information to build a small map to know the relative initial position of the vehicle.

Point cloud feature extraction is often performed in real time, which will lead to a very large amount of point cloud data, and the hardware performance of existing mass-produced vehicles is limited. Therefore, in order to reduce the amount of calculation of point cloud data, when point cloud data is extracted, it is generally preferred to extract some more obvious features, such as the outline information of objects.

A perception algorithm engineer of an OEM said: "Lidar will not be like vision. Vision will have deep semantic information. In order to reduce the amount of calculation, lidar will only extract the features of the target, mainly to extract the "line and surface" of the target. "corner" feature. For example, the utility pole is the feature of the line, the road surface is the feature of the surface, and the corner of the building is the feature of the corner."

3.2 Map matching 

After extracting the features of the surrounding objects, the perception algorithm personnel need to perform point cloud map matching based on these features to obtain the relative pose between each point cloud. Point cloud map matching can generally be divided into inter-frame matching and high-precision map matching.

  • Inter-frame matching, also called sub-image matching, refers to matching point clouds with the same features on the previous and subsequent frames, and finally obtains a small local map.

  • High-precision map matching refers to matching the optimized point cloud with the high-precision map.

In the autopilot industry, autopilot solution providers or OEMs will apply these two different schemes, but the commonly used matching scheme is still based on inter-frame matching.

A perception algorithm engineer of an OEM said: "Only L4 driverless driving projects may be based on map matching based on high-precision map positioning solutions, while OEMs mainly do frame-to-frame matching, because the positioning solutions of mass-produced vehicles cannot all be based on High-precision map. In addition, high-precision map matching requires a lot of calculations, and downsampling must be done before application.”

3.3 Pose Optimization

As mentioned above, after the point cloud data is matched, the relative pose between the point clouds can be obtained, and the accuracy of the relative pose will affect the accuracy of the map construction, so it is necessary to check the relative pose of the point cloud Do some optimization.

Generally speaking, the inaccuracy of the relative pose is mainly caused by some uncontrollable factors, such as the point cloud being occluded by objects or the limitation of the lidar field of view. The pose optimization of the point cloud obtains the optimal relative pose through the rigid body change (rotation or translation) of a certain point cloud coordinate system.

references

[1] How much do you know about LiDAR point cloud data?

https://blog.csdn.net/OpenDataLab/article/details/124962277

【2】Study Notes: Introduction to Point Cloud Library PCL (Point Cloud Library)

https://blog.csdn.net/jeffliu123/article/details/126137566

[3] Point cloud concept and point cloud processing

https://blog.csdn.net/hongju_tang/article/details/85008888

[4] This article summarizes the basic performance indicators of lidar

http://www.360doc.com/content/20/0813/07/71135856_965448273.shtml

【5】Laser point cloud preprocessing

https://zhuanlan.zhihu.com/p/455810371

[6] LiDAR motion compensation method

https://zhuanlan.zhihu.com/p/470795318

【7】LiDAR Calibration

https://blog.csdn.net/qq_39032096/category_11512741.html

[8] Overview of target detection based on 3D point cloud

https://zhuanlan.zhihu.com/p/479539638

[9] Lidar target detection in autonomous driving (below)

https://mp.weixin.qq.com/s/3H6qCDO-2mOP3HjGKnmLzw

[10] Overview of 3D point cloud segmentation (middle)

https://mp.weixin.qq.com/s?__biz=MzI0MDYxMDk0Ng%3D%3D&chksm=e9196bc2de6ee2d4220fbf4fca46ea676eadac19a4e17b5b3d4dd9c0aa772c0b57db57f5a044&idx=1&mid=2247487535&scene=21&sn=1c6687b0a7c6df60dc0bd902676c0ed0#wechat_redirect

【11】Research Progress on Point Cloud Clustering Algorithms Applied to Unmanned Vehicles

https://mp.weixin.qq.com/s?__biz=MzI0MjcxNDQwNQ==&mid=2247488019&idx=1&sn=be3089b2510cfb12b2a955990f9c7e3b&chksm=e9794059de0ec94fb9559cfd011c173424586de115a943376c094e6572af5dcf74121239a009&scene=27

[12] Obstacle detection and tracking based on 3D lidar

https://mp.weixin.qq.com/s/ULDIkGSUfVp3OwWgxR2A9g

【13】Autonomous Driving - Filtering Algorithm

https://blog.csdn.net/limeigui/article/details/127656052

write at the end

communicate with the author

If you want to communicate directly with the author of the article, you can directly scan the QR code on the right and add the author's own WeChat.

  329fba3735804edadda84e21883246de.png

Note: Be sure to note your real name, company, and current position when adding WeChat

And the information about the position of interest, thank you!

About Contribution

If you are interested in contributing to "Nine Chapters Smart Driving" ("knowledge accumulation and sorting" type articles), please scan the QR code on the right and add staff WeChat.

51a1a986cccc5cf80e782ebc000a21f2.jpeg

Note: Be sure to note your real name, company, and current position when adding WeChat

And the information about the position of interest, thank you!


Quality requirements for "knowledge accumulation" manuscripts:

A: The information density is higher than most reports of most brokerages, and not lower than the average level of "Nine Chapters Smart Driving";

B: Information needs to be highly scarce, and more than 80% of the information needs to be invisible on other media. If it is based on public information, it needs to have a particularly powerful and exclusive point of view. Thank you for your understanding and support.

Recommended reading:

Nine chapters - a collection of articles in 2022

Dedicated to "the first city of autonomous driving" - a "moving the capital"

The 4D millimeter-wave radar is clearly explained in the long text of 4D

Application of deep learning algorithm in automatic driving regulation and control

Challenges and dawn of wire control shifting to mass production and commercial use

"Be greedy when others are fearful", this fund will increase investment in the "Automatic Driving Winter"

Guess you like

Origin blog.csdn.net/jiuzhang_0402/article/details/129134821