SLAM paper translation (4) - LinK3D: Linear Keypoints Representation for 3DLiDAR Point Cloud

Table of contents

1 Introduction

2 Related Work

3 The Proposed method

A. Keypoint extraction

B. Descriptor generation

C. Match two descriptors

4 Experiments


Abstract  − Feature extraction and matching are fundamental parts of many computer vision tasks such as 2D or 3D object detection, recognition and registration. It is well known that feature extraction and matching have achieved great success in the 2D domain. Unfortunately, in the 3D domain, current methods cannot support the widespread application of 3D LiDAR sensors in vision tasks due to poor descriptive capabilities and low efficiency. To address this limitation, we propose a novel 3D feature representation method: Linear Keypoint Representation (LinK3D) for 3D LiDAR point clouds. The novelty of LinK3D is that it fully considers the characteristics of LiDAR point clouds (such as sparsity, scene complexity), and uses its robust neighboring keypoints to represent the current keypoint, thus providing strong constraints on the description of the current keypoint. We evaluate the proposed LinK3D on two public datasets (i.e., KITTI, Steven VLP16), and the experimental results show that our method far outperforms the state-of-the-art in terms of matching performance. More importantly, LinkK3D exhibits excellent real-time performance (10 Hz frequency based on LiDAR). LinK3D only takes an average of 32 milliseconds to extract features from a point cloud collected by 64 laser beams, and when executed on a laptop equipped with an Intel Core i7 @2.2 GHz processor, it only takes about 8 milliseconds to match two LiDAR scans. Furthermore, our method can be widely applied to various 3D vision applications. In this paper, we apply LinK3D to 3D registration, LiDAR odometry, and place recognition tasks, and compare the results with competing state-of-the-art methods.

Keywords  - 3D LiDAR point cloud, feature extraction, matching, efficient, extension to 3D vision tasks.

1 Introduction

        Feature extraction and matching are the basis of most computer vision tasks, such as perception, recognition, and reconstruction tasks. In the field of 2D vision, various 2D feature extraction methods (such as Harris, SIFT, SURF, and ORB) have been studied and widely used. However, there are still some unsolved problems in 3D feature representation and matching in the field of 3D vision. Current methods may not be suitable for high-frequency (typically ≥10 Hz) 3D LiDAR and large-scale complex scenes, especially in terms of efficiency and reliability. The irregularity, sparsity, and disorder of 3D LiDAR point clouds make it infeasible to directly apply 2D methods to 3D. At present, 3D LiDAR has been widely used in some mobile devices (such as robots, self-driving cars, etc.), so it is necessary to design an efficient and powerful feature representation method for 3D LiDAR point clouds.

        Existing 3D feature point representation methods [15]–[25] can be mainly divided into two categories according to the extraction strategies, namely hand-designed features and learning-based features. Hand-designed features [15], [17]–[19] mainly describe features in the form of histograms and use local or global statistics for feature representation. Since lidar usually has many similar local features (such as a large number of local planes) in large-scale scenes, these local statistical features are easy to cause mismatch. Global features [20]–[22] are less likely to generate accurate point-to-point matches inside point clouds. Learning-based methods [23]–[25] have made great progress, however, these methods are usually time-consuming (e.g., 3DFeatNet [24] takes about 0.928 seconds to extract 3D features, D3Feat [26] for 3D It takes about 2.3 seconds on time). In fact, most current 3D feature representation methods are usually more suitable for small-scale scenes and small object surfaces (such as Stanford Bunny1 point cloud). Clearly, there are some differences between small-scale scenes and the large-scale ones used by lidar (e.g. KITTI 00 urban scene). Specifically, the main differences are as follows:

  • Surfaces of small objects are usually smooth and continuous, and local surfaces are unique. However, 3D lidar point clouds contain many discontinuous and similar local surfaces (e.g. many similar building surfaces, trees, polar coordinates, etc.). If these point clouds with similar local surfaces are used to generate local 3D features, then these 3D features are also obviously similar, which easily leads to mismatching.
  • 3D lidar point clouds are usually sparse, and the points are not evenly distributed in space. If there are not enough points in a fixed-size space to generate a local statistical description, efficient and accurate 3D feature descriptions may not be obtained.
  • Occlusions are often present in lidar scans, which can lead to inconsistent descriptions of the same local surface in the current and subsequent lidar scans. Therefore, it is necessary to design feature representation methods specifically for 3D lidar point clouds.

Based on these differences, this paper proposes a novel 3D feature for LiDAR point cloud, which can be used for accurate point-to-point matching. Our method first extracts robust aggregated keypoints, and then feeds the extracted aggregated keypoints into a descriptor generation algorithm that generates descriptors in a novel keypoint representation. As shown in Figure 1, the descriptor of the current keypoint is represented by its neighbor keypoints. After obtaining the descriptors, the proposed matching algorithm can quickly match the descriptors of two LiDAR scans. Furthermore, LinK3D proposed in this paper has been applied to various real-time 3D vision tasks. To summarize, our main contributions are as follows:

  • A novel and complete feature extraction method for 3D LiDAR point cloud is proposed, including key point extraction, feature description and matching. Compared with state-of-the-art 3D features, our method achieves significant progress in matching and running time, and is more reliable in sparse LiDAR point clouds.
  • The proposed method has the potential to be applied to various 3D vision tasks. In this paper, LinK3D has been applied to 3D registration, LiDAR odometry, and position recognition in 3D LiDAR SLAM. Compared with the state-of-the-art methods, our method achieves competitive results in these tasks.
  • The proposed method shows impressive efficiency, making our method applicable in real-time. Our method takes an average of 32 ms to extract Link3D features, 8 ms to match the descriptors of two LiDAR scans, and 13 ms to retrieve the robot's revisited position when used for a place recognition task.

Figure 1. The core idea of ​​our LinK3D and the matching results of two lidar scans based on our LinK3D. Green lines indicate valid matches. The descriptor of the current keypoint is represented by its neighboring keypoints. Each dimension of the descriptor corresponds to a sector area. The first dimension corresponds to the sector area where the nearest key point of the current key point is located, and the other dimensions are arranged in counterclockwise order. If there is a keypoint in a sector area, the nearest keypoint in the sector area is searched for and used to represent the descriptor of the corresponding dimension. 

2 Related Work

3 The Proposed method

The pipeline of our method mainly includes two parts: feature extraction (i.e., keypoint extraction and descriptor generation, as shown in Figure 2) and feature matching. As shown in Fig. 2, we first extract the edge points of the lidar scan, and then input them into an edge keypoint aggregation algorithm, from which robust keypoints are extracted for subsequent descriptor generation. Then, the descriptor generation algorithm first builds distance tables and direction tables, which can be used to quickly generate descriptors.

  Fig. 2. The workflow of the proposed LinK3D, including keypoint extraction and descriptor generation. The input point cloud is first processed by key point
extraction, and then described by Link3D to obtain an efficient key point descriptor.

A. Keypoint extraction

        Edge point extraction: In keypoint extraction, we use a strategy similar to 2D image keypoints, i.e. corner or edge points. Here, we also extract representative edge points from 3D lidar scans as keypoints. LiDAR scans can be roughly divided into two categories: edge points and planar points. The main difference between edge points and planar points is how smooth the local surface they are on.

        Given a 3D lidar point cloud , let be a point in . Let be the set of consecutive points located on the same scan line as the points and distributed uniformly on both sides. The base of the order representation . The smooth term at the current point is defined as follows:

 where and are the coordinates of points and , respectively . Edge points (shown in Figure 3a) are extracted by values ​​larger than a threshold .

        Edge keypoint aggregation: When collecting edge points, there are usually many points that exceed the threshold but are unstable. Specifically, these unstable points appear in the current scan, but may not appear in the next scan. As indicated by the red dotted box in Fig. 3a, the unstable points are usually scattered. If we use these points for matching, it may cause a mismatch. Therefore, it is necessary to filter out these points and find positive and valid edge keypoints. Effective edge keypoints are usually distributed in the form of vertical clusters, as shown by the blue dashed box in Fig. 3a.

        In this paper, a novel keypoint aggregation algorithm is designed for finding efficient edge keypoints. Use angle information to guide and cluster potential points in a small group rather than the entire region, as shown in Algorithm 1. The motivation is that points with roughly the same horizontal angle are more likely to be in the same cluster. Specifically, we first divide the horizontal plane centered at the origin of the LiDAR coordinate system into equal sectors, and then only cluster points in each sector. Notably, our algorithm runs about 25 times faster than classical KMeans [47] when we set it up in experiments.

The extracted edge keypoints are shown in Fig. 3b. It can be observed that our algorithm can filter out invalid edge points and find positive and valid edge keypoints. Furthermore, the centroid of each cluster point is calculated and named as aggregation keypoint, which can represent its cluster and be used for subsequent descriptor generation.

B. Descriptor generation

        In descriptor generation, all clustered keypoints are first projected onto the horizontal plane, which can eliminate the influence caused by clustered edge keypoints distributed unevenly along the vertical direction. For fast matching, our Link3D is represented as a multi-dimensional vector, and each dimension uses 0 or the distance between the current keypoint and its neighboring keypoints. As shown in Figure 4, we divide the horizontal plane into 180 sector regions centered on the current keypoint k0, and each dimension of the descriptor corresponds to a sector region. Inspired by the 2D descriptor SIFT [10] to search for the main orientation to ensure pose invariance, the main orientation of LinK3D is also determined by searching and denoted as a direction vector from the current keypoint k0 to the nearest keypoint k1, which is located at In the first sector area, other sector areas are arranged in counterclockwise order. Then, the closest keypoint to k0 is searched in each sector. If there is a nearest keypoint, the distance between the current keypoint and the nearest keypoint is taken as the corresponding value of the descriptor, otherwise the value is set to 0.

 Figure 4. The horizontal plane centered at the current keypoint ko is divided into 180 sectors. We first search for the keypoint k1 closest to keypoint k0, then the main direction is the vector from k0 to k1. Also, the principal direction bisects the first sector face. Other sectors are arranged in counterclockwise order

In the process, the direction from the current point to other points is expressed as , and the angle to the main direction is used to determine which sector it belongs to. This angle is calculated by the following formula:

 which is defined as:

 There are two main problems in the above algorithm.

  • One problem is that the algorithm is very sensitive to nearest keypoints. Matching will fail in the presence of interference from outlier keypoints.
  • Another problem is that we need to frequently calculate the relative distance and direction between two points, so there will be a lot of repeated calculations.

To address the first problem, we search for a certain number of nearest keypoints (the specific number is evaluated in the experimental section). Suppose we search for the 3 nearest keypoints and compute the corresponding 3 keypoint descriptors, as shown in Figure 5. Des1 corresponds to the nearest keypoint and Des3 corresponds to the third nearest keypoint. We define priorities based on these three distance values. Des1 has the highest priority because it has the closest distance and Des3 has the lowest priority because it has the farthest distance. The value of each dimension in the final descriptor corresponds to the non-zero value with the highest priority among them.

 Figure 5. The value of each dimension in the final descriptor corresponds to the non-zero value with the highest priority in Des1, Des2, and Des3. 

As shown in the red dotted box in Figure 5, if Des1 has a non-zero value , then its corresponding value in the final descriptor is also set as it has a higher priority. The other two cases are shown in purple and black dashed boxes in Figure 5. This greatly improves the robustness of our LinK3D to outliers. To solve the second problem, we build distance tables and direction tables, and the distances and directions of all keypoints can be obtained by directly referring to these tables to avoid double counting. The specific algorithm is shown in Algorithm 2, which shows the process of extracting a descriptor.

C. Match two descriptors

        In this section, the matching algorithm is introduced. In order to quickly measure the similarity of two descriptors, we adopt a calculation method similar to the Hamming distance to define the similarity score of two LinkK3D descriptors. They both compute the corresponding dimension. However, unlike Hamming's XOR operation, we only count non-zero dimensions in both descriptors. Specifically, the absolute value of the difference of corresponding non-zero dimensions in two descriptors is calculated. If the value is less than 0.2, the similarity score is increased by 1. A match is considered valid if its similarity score exceeds the threshold Thscore. The specific matching algorithm is shown in Algorithm 3. After matching the two descriptors, matches for edge keypoints are searched. For matching pooled keypoints, the corresponding edge keypoint with the highest r-value (smoothing term) on each scanline is chosen. Edge keypoints on the same scanline are matched.

4 Experiments

Guess you like

Origin blog.csdn.net/qq_41921826/article/details/131858910