yolov5 monocular ranging + speed measurement + target tracking

Insert image description here

To add ranging and speed measurement functions in YOLOv5, you need to understand the principles of the following two parts:

Monocular ranging algorithm

  • Monocular ranging uses a single camera to estimate the distance of objects in a scene. Common monocular ranging algorithms include disparity-based methods (such as stereo matching) and deep learning-based methods (such as neural networks).
  • Deep learning-based methods usually use convolutional neural networks (CNN) to learn the mapping relationship from images to depth maps.

Monocular ranging code

Monocular ranging involves coordinate conversion, the code is as follows:

def convert_2D_to_3D(point2D, R, t, IntrinsicMatrix, K, P, f, principal_point, height):
    """

    例如:像素坐标转世界坐标
    Args:
        point2D: 像素坐标点
        R: 旋转矩阵
        t: 平移矩阵
        IntrinsicMatrix:内参矩阵
        K:径向畸变
        P:切向畸变
        f:焦距
        principal_point:主点
        height:Z_w

    Returns:返回世界坐标系点,point3D_no_correct, point3D_yes_correct

    """
    point3D_no_correct = []
    point3D_yes_correct = []


    ##[(u1,v1),
   #   (u2,v2)]

    point2D = (np.array(point2D, dtype='float32'))

One way to add monocular ranging functionality in YOLOv5 is to collect data with object annotations and depth information on the training set. The input image can then be mapped to a depth map using a deep learning model such as a convolutional neural network. Once trained, you can use the model to estimate the distance of objects in an image.
Insert image description here

Frame Difference Algorithm

  • The difference frame algorithm is a method of calculating the speed of an object based on the difference between frames in a video sequence. It is based on a simple assumption: the greater the change in an object's position between adjacent frames, the faster the object's speed.
  • The difference frame algorithm is a method of calculating the speed of an object based on the difference between frames in a video sequence. The principle is to calculate the position difference of an object between two adjacent frames, and then calculate the speed of the object through the time interval.

Assuming that the positions of the object in frame t and frame (t-1) are pt and pt-1 respectively, you can use Euclidean distance or other similarity measures to calculate the distance between them:

d = ||pt - pt-1||

Where ||.|| represents the Euclidean distance. Then, calculate the average velocity v of the object through the time interval Δt:

v = d / Δt

Among them, Δt represents the time interval between the t-th frame and the (t-1)-th frame. In practical applications, the velocity can be smoothed as needed, for example using methods such as moving average or Kalman filtering.

Speed ​​test code

以下是一个简单的差帧算法代码示例,用于计算物体在视频序列中的速度:

```python
import cv2
import numpy as np

# 读取视频文件
cap = cv2.VideoCapture('video.mp4')

# 初始化参数
prev_frame = None
prev_position = None
fps = cap.get(cv2.CAP_PROP_FPS)  # 视频帧率
speeds = []  # 存储速度值

while cap.isOpened():
    ret, frame = cap.read()

    if not ret:
        break

    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    if prev_frame is not None:
        # 计算当前帧和前一帧之间的位置差异
        flow = cv2.calcOpticalFlowFarneback(prev_frame, gray, None, 0.5, 3, 15, 3, 5, 1.2, 0)

        # 提取运动向量的x和y分量
        vx = flow[..., 0]
        vy = flow[..., 1]

        # 计算位置差异的欧氏距离
        distance = np.sqrt(np.square(vx) + np.square(vy))

        # 计算速度
        speed = np.mean(distance) * fps

        speeds.append(speed)

        # 可选:可视化结果
        flow_vis = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
        flow_vis[..., 0] = np.arctan2(vy, vx) * (180 / np.pi / 2)
        flow_vis[..., 2] = cv2.normalize(distance, None, 0, 255, cv2.NORM_MINMAX)
        flow_vis = cv2.cvtColor(flow_vis, cv2.COLOR_HSV2BGR)

        cv2.imshow('Flow Visualization', flow_vis)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break

    prev_frame = gray

cap.release()
cv2.destroyAllWindows()

# 打印速度结果
print("速度列表:", speeds)

The code uses the ` function from the OpenCV library to calculate the optical flow vector between adjacent frames and calculates the position difference via Euclidean distance. Then, the speed is calculated from the frame rate of the video and stored in a list. You can further process or visualize the speed according to your needs. Please note that this is just a simple example and may need to be adjusted and improved in actual applications.

Summarize

The specific steps to implement the above functions are as follows:
Insert image description here

Monocular ranging:

  • Collect a training data set, including object annotations and corresponding depth information.
    Build a deep learning model, such as using a convolutional neural network (such as ResNet, UNet, etc.) to map images to depth maps.
  • Use the collected data sets for model training and optimize deep learning models.
  • When adding the monocular ranging function in YOLOv5, load the trained deep learning model and use the model to estimate the distance when an object is detected.

Difference frame algorithm:

  • Perform object detection and tracking on video sequences to obtain position information of objects in consecutive frames.
  • To calculate the difference in object position between adjacent frames, you can use Euclidean distance or other similarity measures.
  • qq 1309399183
  • Divide the difference by the time interval to get the average speed of the object.

Guess you like

Origin blog.csdn.net/ALiLiLiYa/article/details/135034830