C++ simply reproduces the optimization-based visual odometer

This article mainly records my understanding of the visual odometry based on the optimization method, and uses C++ code to implement a visual odometry algorithm that inputs two pictures and the corresponding depth of the picture, and outputs a pose transformation.

Visual odometry is one of the most important parts in visual SLAM. It is located at the front end of the entire visual SLAM. The core task is to provide a good pose estimation for the entire SLAM process. There are many ways to implement visual odometry. This article mainly discusses the The principle and code implementation of the visual odometry realized by the optimization method.

(If there are mistakes or inaccuracies, please criticize and correct!)

The main process of the visual odometry based on the optimization method is as follows:

  1. Read two frames of pictures from the camera, extract feature points, and perform feature point matching.
  2. Use the matched feature points for reprojection to construct the reprojection error and convert the whole problem into a least squares problem.
  3. Solve using optimization methods.

The pose  we are most concerned about is used during reprojection, so the smaller the reprojection error , the better the pose used for reprojection . Next, look at the whole process together with the code, and add some theories and formulas when necessary.

The first step is to extract feature points and match feature points

Mat img_1 = imread(argv[1], CV_LOAD_IMAGE_COLOR);
  Mat img_2 = imread(argv[2], CV_LOAD_IMAGE_COLOR);
  assert(img_1.data && img_2.data && "Can not load images!");

  vector<KeyPoint> keypoints_1, keypoints_2;
  vector<DMatch> matches;
  find_feature_matches(img_1, img_2, keypoints_1, keypoints_2, matches);
 //完成特征点的提取和匹配之后,图一的特征点保存在keypoints_1中
 //matches中保存的是两张图匹配好的特征点

 The extraction and matching of feature points is very simple. It mainly calls the functions that come with opencv, so I won’t go into details here. The final result is three containers, keypoints_1, keypoints_2, matches, which store the pixel coordinates of feature points (must be Pay attention to what coordinates are placed in each container. Visual slam involves coordinate conversion of multiple coordinate systems such as pixel coordinates, camera coordinates, world coordinates, and normalized plane coordinates. It is easy to make mistakes if you don’t figure it out).

The second step is to perform coordinate transformation to prepare for subsequent reprojection

The conversion of this step uses the imaging model of the binocular camera. The binocular camera can restore the depth of each pixel on the photo according to the focal length, baseline, parallax, etc., so as to convert the 2D pixel into a 3D world coordinate point. .

Binocular camera model (excerpted from visual slam14 lecture)

        z represents depth, which is the calculation formula of depth

 The specific code is as follows:

  // 第二步是利用双目相机的深度图,还原匹配的特征点的世界坐标
  Mat d1 = imread(argv[3], CV_LOAD_IMAGE_UNCHANGED);       // 深度图为16位无符号数,单通道图像
  Mat K = (Mat_<double>(3, 3) << 520.9, 0, 325.1, 0, 521.0, 249.7, 0, 0, 1);//相机内参
  vector<Point3f> pts_3d;//特征点的世界坐标容器
  vector<Point2f> pts_2d;//特征点的像素坐标容器
  //遍历匹配好的特征点
  for (DMatch m:matches) {
    ushort d = d1.ptr<unsigned short>(int(keypoints_1[m.queryIdx].pt.y))[int(keypoints_1[m.queryIdx].pt.x)];
    //上述代码用来获取某一个点的深度
    //keypoints_1[m.queryIdx].pt.y和.x作为索引来使用,所以转换为int类型,代码可以化简为ushort d = d1.ptr<unsigned short>(y')[x'];
    //意思就是获取y行x列的深度值
    if (d == 0)   // bad depth
      continue;
    //这里是进行一个归一化的操作,5000是一个常用的经验值,不太明白
    float dd = d / 5000.0;
    //像素坐标转换到相机坐标,由于要获取世界坐标,所以先把像素坐标转换到相机坐标
    Point2d p1 = pixel2cam(keypoints_1[m.queryIdx].pt, K);
    //通过乘以深度将相机坐标转换到了世界坐标,这里涉及到双目相机的模型知识,注意这里的世界坐标来源是图一的特征点
    pts_3d.push_back(Point3f(p1.x * dd, p1.y * dd, dd));
    //图2的特征点直接插入到pts2d中去了,作为后面构建重投影误差的观测,这里是像素坐标
    pts_2d.push_back(keypoints_2[m.trainIdx].pt);
    //pts3d放的是由图一特征点得到的世界坐标,后面用来重投影(估计值
    //pts2d得到的是由图二特征点得到的像素坐标,后面座位重投影中的观测(真值
    //由于是遍历matches得到的,所以pts3d和pts2d的点都是匹配上的特征点,是对应的
  }

 After the conversion is completed, pts3d and pts2d are obtained. Here we should pay attention to the contents of these two containers. pts3d puts the world coordinates obtained from the feature points in Figure 1, which will be used for reprojection later. As an estimated value, pts2d obtained The pixel coordinates obtained from the feature points in Figure 2 are later used as observations in reprojection, that is, the true value. Since they are obtained by traversing the matches, the points of pts3d and pts2d are matching feature points, which are one-to-one correspondence. .

The third step is to construct the reprojection error, establish the least squares problem, and use the Gauss-Newton method to solve it

Paste the code first, and then disassemble the code and explain it part by part

void bundleAdjustmentGaussNewton(
  const VecVector3d &points_3d,
  const VecVector2d &points_2d,
  const Mat &K,
  Sophus::SE3d &pose) {
    //传入进来的points3d是世界坐标下的图一特征点
    //传入进来的points2d是像素坐标下的与图一特征点匹配的图二像素点
  typedef Eigen::Matrix<double, 6, 1> Vector6d;
  const int iterations = 10;//迭代次数
  double cost = 0, lastCost = 0;//代价值
  //相机内参
  double fx = K.at<double>(0, 0);
  double fy = K.at<double>(1, 1);
  double cx = K.at<double>(0, 2);
  double cy = K.at<double>(1, 2);

  for (int iter = 0; iter < iterations; iter++) {
    //初始化海森矩阵
    Eigen::Matrix<double, 6, 6> H = Eigen::Matrix<double, 6, 6>::Zero();
    //初始化误差矩阵
    Vector6d b = Vector6d::Zero();

    cost = 0;
    // compute cost
    for (int i = 0; i < points_3d.size(); i++) {
      //因为此时的坐标是世界坐标,因此要转到相机坐标下
      Eigen::Vector3d pc = pose * points_3d[i];//这里用的是世界坐标到相机坐标的公式
      //pc就是相机坐标下的三维点
      double inv_z = 1.0 / pc[2];//1/Z,在相机转像素用到
      double inv_z2 = inv_z * inv_z;//在雅可比计算中才用到
      Eigen::Vector2d proj(fx * pc[0] / pc[2] + cx, fy * pc[1] / pc[2] + cy);
      //再把相机坐标一步到位投影到像素坐标,经历了内参和外参

      Eigen::Vector2d e = points_2d[i] - proj;//构建最小二乘方程

      cost += e.squaredNorm();//但是误差不能用上面的来描述,用的是平方范数
      Eigen::Matrix<double, 2, 6> J;
      //下面是雅可比矩阵的计算,通过求导推倒出雅可比矩阵,然后把值都代进去算,每个值都是知道的
      J << -fx * inv_z,
        0,
        fx * pc[0] * inv_z2,
        fx * pc[0] * pc[1] * inv_z2,
        -fx - fx * pc[0] * pc[0] * inv_z2,
        fx * pc[1] * inv_z,
        0,
        -fy * inv_z,
        fy * pc[1] * inv_z2,
        fy + fy * pc[1] * pc[1] * inv_z2,
        -fy * pc[0] * pc[1] * inv_z2,
        -fy * pc[0] * inv_z;
        //计算海森举证
      H += J.transpose() * J;
      //计算误差矩阵
      b += -J.transpose() * e;
    }

    Vector6d dx;
    //求解dx,dx就是代估计量,在这里就是位姿
    dx = H.ldlt().solve(b);//求解用到了海森矩阵和误差矩阵,就是上面求的

    if (isnan(dx[0])) {
      cout << "result is nan!" << endl;
      break;
    }

    if (iter > 0 && cost >= lastCost) {
      // cost increase, update is not good
      cout << "cost: " << cost << ", last cost: " << lastCost << endl;
      break;
    }

    // update your estimation
    //通过dx来更新位姿,从而达到视觉里程计的目的,提供一个不错的位姿供后续操作
    pose = Sophus::SE3d::exp(dx) * pose;
    lastCost = cost;

    cout << "iteration " << iter << " cost=" << std::setprecision(12) << cost << endl;
    if (dx.norm() < 1e-6) {
      // converge
      break;
    }
  }

  cout << "pose by g-n: \n" << pose.matrix() << endl;
}

First, clarify the concept of reprojection error. For a simple example, a kettle is placed in front of you. You take two pictures of the kettle at different positions. The kettle did not move during the process. The position of the kettle in the world coordinates in the two photos should be the same, then use the first photo to get the world coordinates of the kettle, and project the kettle to the second photo through an estimated pose transformation. For projection, since the pose transformation is estimated, it is not necessarily accurate, so there must be a certain gap between the projected past position and the position of the kettle on the second photo. The gap between these two positions is the reprojection error, which is detailed below Tell me about the process.

 As shown in the figure, Figure 1 and Figure 2 are two shots of the object P. During this process, P has not moved, but the position and posture of the shooting have changed. We construct the reprojection error to find the transformation of this pose. . First, through the matching of the feature points in Figure 1 and Figure 2, we get a pair of matched feature points P1 and P2, and then we use the pixel coordinates of P1 + the depth of the camera (yes, we use a binocular camera, this process Just like the second part above) Restore the world coordinates P(X,Y,Z) of P, and after getting the world coordinates of P, we start to project.

The first step of projection is to convert point P from the world coordinate system to the camera coordinate system when the picture 2 was taken. This process is based on the following formula, and the pose is used here.

 It can be seen from (2) or (3) that the pose is needed from the world coordinate system to the camera coordinate system. After converting the world coordinate P to the camera coordinate system in Figure 2 through an estimated pose, it needs to be converted to Pixel coordinates, so that it is truly projecting a world coordinate point onto the photo. The formula for this process can be referred to formula (1), through the external parameters R, t and internal parameters K in one step, or it can be converted to the camera coordinate system first, and then Convert camera coordinates to pixel coordinates by the following formula.

 Through these two steps of conversion, we get the coordinates (green) of P' in the sketch, which is the projection of P on Figure 2. The blue connection between P' and P2 is the reprojection error, and P2 is used as the observation , is the true value, and P' is the projected value. As an estimated value, the difference between the two is the reprojection error. The corresponding code for this part is as follows:

      //因为此时的坐标是世界坐标,因此要转到相机坐标下
      Eigen::Vector3d pc = pose * points_3d[i];//这里用的是世界坐标到相机坐标的公式
      //pc就是相机坐标下的三维点
      double inv_z = 1.0 / pc[2];//1/Z,在相机转像素用到
      double inv_z2 = inv_z * inv_z;//在雅可比计算中才用到
      Eigen::Vector2d proj(fx * pc[0] / pc[2] + cx, fy * pc[1] / pc[2] + cy);
      //再把相机坐标一步到位投影到像素坐标,经历了内参和外参

      Eigen::Vector2d e = points_2d[i] - proj;//构建最小二乘方程

      cost += e.squaredNorm();//但是误差不能用上面的来描述,用的是平方范数

At this point, the construction of the reprojection error is completed, and we have also obtained the least squares formula. Next, we can use various methods to solve the least squares problem. Here, the Gauss-Newton method is used to solve it.

The essence of least squares is actually an error function. This error function is related to a certain value to be estimated. Here it is the reprojection error. The value to be estimated is the pose transformation T, which can be described by the following formula:

The idea of ​​Gauss Newton to solve the least squares is to find an increment of the estimated value, which makes the estimated value change in the direction of reducing the overall error, which can be expressed by the following formula:

In this formula, what we need to solve is that \Delta XJ(x) is the Jacobian matrix, which stores the partial derivative of the error function with respect to the estimated quantity, and f(x) is the current error size, so in the Gauss-Newton method In , each iteration needs to calculate the size of the Jacobian matrix and the size of f(x), use these two values ​​to solve it \Delta X, and then use \Delta Xit to update the pose. Here, the expression of the Jacobian matrix is ​​as follows:

 The code for this part is as follows:

 //下面是雅可比矩阵的计算,通过求导推倒出雅可比矩阵,然后把值都代进去算,每个值都是知道的
      J << -fx * inv_z,
        0,
        fx * pc[0] * inv_z2,
        fx * pc[0] * pc[1] * inv_z2,
        -fx - fx * pc[0] * pc[0] * inv_z2,
        fx * pc[1] * inv_z,
        0,
        -fy * inv_z,
        fy * pc[1] * inv_z2,
        fy + fy * pc[1] * pc[1] * inv_z2,
        -fy * pc[0] * pc[1] * inv_z2,
        -fy * pc[0] * inv_z;
        //计算海森举证
      H += J.transpose() * J;
      //计算误差矩阵
      b += -J.transpose() * e;
    }

    Vector6d dx;
    //求解dx,dx就是代估计量,在这里就是位姿
    dx = H.ldlt().solve(b);//求解用到了海森矩阵和误差矩阵,就是上面求的

It can be seen that H and b are obtained here, that is, the H matrix composed of the Jacobian matrix and b representing f(x), and then use the ldlt algorithm to directly solve dx, and finally use dx to update the pose, and we get what we need pose , the code is as follows:

pose = Sophus::SE3d::exp(dx) * pose;
    lastCost = cost;

So far, we have completed a visual odometry based on the optimization method. The input is two frames of pictures and depth , and the output is a pose

In the future, consider running this program with your own binocular camera, then draw the pose trajectory, and compare it with the real trajectory to see if this method works well, and then continue to update after completion

Guess you like

Origin blog.csdn.net/weixin_43890835/article/details/131403547