DepthFilter depth filter principle

Introduction

In the SVO algorithm, the depth filter is used to estimate the 3D position of the feature point. This article aims to understand the principle of the depth filter and related knowledge points.

1. Epipolar search and block matching

The figure below is a schematic diagram of limit search.
insert image description here

1.1 Epipolar search

A certain feature point p1 of the reference frame (left picture) is known. According to the camera geometry principle, the 3D space point corresponding to p1 is somewhere on the ray connecting the optical center O1 and p1. Assume that the depth range of the 3D point is (d_min, +∞).

Similarly, the 3D point also has a projection point p2 on the adjacent current frame image, and p2 is also the intersection point of the line connecting the 3D point and the optical center O2 of the current frame. Because the position of the 3D point is unknown, the projection of the possible positions of all 3D points within the (d_min, +∞) range on the current frame will form an epipolar line l2, we know that p2 must be on this epipolar line, we only need The matching point p2 of p1 in the current frame can be found by searching along the epipolar line. This process is the limit search.

1.2 Block matching

According to the idea of ​​the direct method, by comparing the gray value of each pixel on the epipolar line l2 with p1, the smaller the gray value difference, the greater the possibility of matching points. But if there are many points similar to p1 on the epipolar line, there may be a large mismatch when comparing the differences between individual pixels, so the concept of block matching is introduced. That is, by comparing the differences between the pixel blocks in the neighborhood of feature points to improve the degree of discrimination, and find the corresponding matching point p2.

1.3 Block matching method

Now take the small pixel blocks around p1, assuming that the size of the pixel block is w×w; and many small blocks are also taken on the epipolar line. It is advisable to record the small blocks around p1 as Bi A∈Rw×w, and the n small blocks on the epipolar line as Bi, i=1,...,n.

(1) SAD (Sum of Absolute Difference). As the name implies, the sum of the absolute values ​​of the differences between two small blocks is taken
insert image description here

(2) SSD (Sum of Squared Distance), which is the sum of the squares of the difference between the gray values ​​of two pixel blocks:
insert image description here

(3) NCC (Normalized Cross Correlation) (normalized cross correlation). This way computes the correlation of two small blocks.
The numerator is the sum of the multiplication of the corresponding elements of the A matrix and the B matrix, and the denominator is the square root of the sum of the squares of the elements of the A matrix and the sum of the squares of the elements of the B matrix.
insert image description here

At the same time, the mean value operation can be performed, and the mean value operation is performed on the AB matrix respectively, and then the above method is used to deal with the light and dark changes between adjacent frames.

2. Depth filter with Gaussian distribution

Now, we calculate the similarity measure between A and each Bi on the epipolar line. For the convenience of description, assuming that we use NCC, then we will get a NCC distribution along the epipolar line. The shape of this distribution depends heavily on what the image itself looks like. In the case of long search distances, we usually get a non-convex function: there are many peaks in this distribution, but there must be only one real corresponding point. In this case, we tend to use a probability distribution to describe the depth value, rather than a single value to describe the depth. So, our question turns to how our estimated depth distribution will change when we keep doing epipolar searches on different images - this is the so-called depth filter .
insert image description here

2.1 Depth filter based on Gaussian distribution

The core idea is to assume that the depth value of the feature point and the observed depth value (obtained by multi-frame triangulation) obey the Gaussian probability distribution, and continuously update the probability distribution of the depth value of the feature point through the observation of the depth. When the certainty is less than the threshold, it is considered to be converged, and the depth value at this time is the feature point depth. However, the translation between two frames is used for depth estimation. Since the translation calculated by monocular matching has no real scale information, the estimated depth at this time is also scale-free.
insert image description here

Main process:

  1. Assume that the depth of all pixels satisfies an initial Gaussian distribution;
  2. When new data is generated, the position of the projection point is determined by epipolar search and block matching;
  3. Calculate the depth and uncertainty after triangulation according to the geometric relationship;
  4. Fuse the current observation into the previous estimate. Stop calculation if converged, otherwise return 2.

Assume that the depth of a feature point obeys a Gaussian distribution with mean μ and variance σ:P(d) ~ N(μ,σ^2)

When new data arrives (new matching is generated), the observed value can be obtained according to the geometric principle, assuming that the observed value also obeys the probability distributionP(d_obs) ~ N(μ_obs,σ_obs^2)

Then the current observation information can be used to update the previous probability distribution of the feature points. According to the properties of the Gaussian distribution, the following updates can be performed:
insert image description here

When the fused distribution uncertainty converges to a certain threshold, it is considered that the estimated value at this time is the depth of the corresponding feature point.

2.2 He Jiajia's explanation of the depth filter

The most basic depth estimation is triangulation, which is the basis of multi-view geometry. We know that the depth value of this point can be calculated by matching points of two frames of images. If there are multiple images, multiple depth values ​​of this point can be calculated. It is like we have made multiple measurements on the same state variable, therefore, Bayesian estimation can be used to fuse multiple measurement values ​​to reduce the uncertainty of the estimation. As shown below:
insert image description here

At the beginning, the uncertainty of depth estimation is large (light green part). After obtaining a depth estimate by triangulation, this uncertainty can be greatly reduced (dark green part).

3. The use of depth filters in SVO

The following content is referenced from: https://zhuanlan.zhihu.com/p/85190014

Here, first briefly introduce the process of triangulation calculation depth in svo, mainly the epipolar search to determine the matching point. In the reference frame Ir, we know the image position of a feature, assuming its depth value is between [d_min,d_max], then according to the depth values ​​of these two endpoints, we can calculate their position in the current frame Ik , as the line segment in the grass-green circle in the figure above.

After determining the position of the epipolar segment where the feature appears, the feature search and matching can be performed. If the epipolar segment is very short, less than two pixels, then directly using the Feature Alignment optical flow method mentioned above when calculating the pose can predict the feature position more accurately.

If the epipolar line segment is very long, it is divided into two steps. The first step is to sample at intervals on the epipolar line segment, and match the sampled feature blocks with the feature blocks in the reference frame one by one, and use the Zero mean Sum of Squared Differences method for each Sampling feature block score, whichever has the highest score indicates that it matches the feature block in the reference frame best. The second step is to use Feature Alignment near the highest scoring point to obtain the feature point position with sub-pixel accuracy. Once the pixel position is determined, the depth can be calculated by triangulation.

Finally, after obtaining a new depth estimate, the Bayesian probability model can be used to update the depth value.

In the process of depth estimation, in addition to calculating the depth value, the uncertainty of this depth value also needs to be calculated, which will be used in many places, such as determining the starting position and length of the epipolar line in the epipolar line search, such as using Bayesian probability is used in the process of updating the depth to determine the update weight (just like the role played by the covariance matrix in the Kalman filter), such as judging whether the depth point has converged, inserting the map if it converges, and so on.

4. Inverse Depth

When searching for epipolar lines, there may be multiple extreme values. In order to determine the true depth value, the depth value is described by using the probability distribution of depth.

(1) The inverse depth error can reduce the error caused by the farther depth points in the real world. For example, for different depth points at 100m and 50m, it may appear as a distance of several pixels in the image. If the depth error is used, the difference is 50, but the inverse depth error is only 1/50-1/100=0.01.
(2) When the inverse depth is used, the homogeneous expression of the inverse depth factor and features can be obtained by normalizing the coordinates, and the optimization variables are reduced. (3) The expression of the inverse depth can
express farther points and is closer to Gaussian distribution

Guess you like

Origin blog.csdn.net/guanjing_dream/article/details/129346040