1. Description

This is the second article in the "Point Cloud Processing" tutorial. The "Point Cloud Processing" tutorial is beginner-friendly, in which we briefly introduce the point cloud processing pipeline from data preparation to data segmentation and classification. In this tutorial we will learn how to compute a point cloud from a depth image without using Open3D libraries. We will also show how to optimize the code for better performance.

2. Depth image

A depth image (also known as a depth map) is an image in which each pixel provides its distance value relative to the sensor coordinate system. Depth images can be captured by structured light or time-of-flight sensors. To calculate depth data, a structured light sensor such as the Microsoft Kinect V1 compares the distortion between projected and received light. As for time-of-flight sensors like the Kinect V2 Microsoft, they cast rays and then calculate the time between casting and receiving those rays.

In addition to the depth image, some sensors also provide a corresponding RGB image to form an RGB-D image. The latter makes it possible to compute colored point clouds. This tutorial will use a Microsoft Kinect V1 RGB-D image as an example.

Let's start by importing the library:

import imageio.v3 as iio
import numpy as np
import matplotlib.pyplot as plt
import open3d as o3d

Now, we can import the depth image and print its resolution and type:

# Read depth image:
depth_image = iio.imread('data/depth.png')

# print properties:
print(f"Image resolution: {depth_image.shape}")
print(f"Data type: {depth_image.dtype}")
print(f"Min value: {np.min(depth_image)}")
print(f"Max value: {np.max(depth_image)}")

Image resolution: (480, 640)
Data type: int32
Min value: 0
Max value: 2980

The depth image is a matrix of size 640 by 480 , where each pixel is a 32 (or 16 ) bit integer representing a distance in millimeters, so the depth image appears black when opened (see image below). A minimum value of 0 represents noise (no distance), while a maximum value of 2980 represents the distance of the furthest pixel.

Depth image generated by Microsoft Kinect V1.

For better visualization, we compute its grayscale image:

depth_instensity = np.array(256 * depth_image / 0x0fff, dtype=np.uint8)
iio.imwrite('output/grayscale.png', depth_instensity)

Computing a grayscale image means scaling the depth values to . Now the image is much clearer:[0, 255]

Calculated grayscale image. Black pixels represent noise.

Note that Matplotlib does the same when visualizing depth images:

# Display depth and grayscale image:
fig, axs = plt.subplots(1, 2)
axs[0].imshow(depth_image, cmap="gray")
axs[0].set_title('Depth image')
axs[1].imshow(depth_grayscale, cmap="gray")
axs[1].set_title('Depth grayscale image')
plt.show()

Matplotlib automatically scales the pixels of the depth image.

3. Point cloud

Now that we have imported and displayed the depth image, how do we estimate the point cloud from it? The first step is to calibrate the depth camera to estimate the camera matrix, which is then used to compute the point cloud. The obtained point cloud is also called a 2.5D point cloud because it is estimated from a 2D projection (depth image) instead of a 3D sensor such as a laser sensor.

3.2 Depth camera calibration

Calibrating a camera means estimating lens and sensor parameters by finding the distortion coefficients and camera matrix (also known as intrinsic parameters). In general, there are three ways to calibrate the camera: using standard parameters provided by the factory, using results obtained in a calibration study, or manually calibrating the Kinect. Calibrating a camera manually involves applying a calibration algorithm such as the checkerboard algorithm [1]. The algorithm is implemented in Robot Operating System (ROS) and OpenCV. The calibration matrix M is a 3× 3 matrix:

Where fx, fy and cx, cy are focal length and optical center respectively. In this tutorial we will use the results obtained from the NYU Deep V2 dataset :

# Depth camera parameters:
FX_DEPTH = 5.8262448167737955e+02
FY_DEPTH = 5.8269103270988637e+02
CX_DEPTH = 3.1304475870804731e+02
CY_DEPTH = 2.3844389626620386e+02

If you want to calibrate the camera yourself, you can refer to this OpenCV tutorial .

3.3 Point Cloud Computing

Computing the point cloud here refers to transforming depth pixels from the depth image 2D coordinate system to the depth camera 3D coordinate system ( x, y, and z ). The 3D coordinates are calculated using the following formula [2], where depth(i, j) is the depth value at row i and column j :

This formula is applied to each pixel:

# compute point cloud:
pcd = []
height, width = depth_image.shape
for i in range(height):
   for j in range(width):
       z = depth_image[i][j]
       x = (j - CX_DEPTH) * z / FX_DEPTH
       y = (i - CY_DEPTH) * z / FY_DEPTH
       pcd.append([x, y, z])

Let's display it using the Open3D library:

pcd_o3d = o3d.geometry.PointCloud()  # create point cloud object
pcd_o3d.points = o3d.utility.Vector3dVector(pcd)  # set pcd_np as the point cloud points
# Visualize:
o3d.visualization.draw_geometries([pcd_o3d])

Point cloud computed from depth image.

4. Color point cloud

What if we want to compute a colored point cloud from an RGB-D image? Color information can enhance the performance of many tasks such as point cloud registration. In this case, it is best to use the input sensor if it also provides an RGB image. A colored point cloud can be defined as follows:

where x, y , and z are 3D coordinates, and r, g , and b represent colors in the RGB system.

We start by importing the corresponding RGB image of the previous depth image:

# Read the rgb image:
rgb_image = iio.imread('../data/rgb.jpg')

# Display depth and grayscale image:
fig, axs = plt.subplots(1, 2)
axs[0].imshow(depth_image, cmap="gray")
axs[0].set_title('Depth image')
axs[1].imshow(rgb_image)
axs[1].set_title('RGB image')
plt.show()

Depth image and its corresponding RGB image

To find the color of a given point p(x, y, z) defined in the depth sensor's 3D coordinate system :

1. We convert it to the RGB camera coordinate system [2]:

where R and T are the extrinsic parameters between the two cameras: the rotation matrix and the translation vector, respectively.

Again, we use parameters from the NYU Deep V2 dataset :

# Rotation matrix:
R = -np.array([[9.9997798940829263e-01, 5.0518419386157446e-03, 4.3011152014118693e-03],
                   [-5.0359919480810989e-03, 9.9998051861143999e-01, -3.6879781309514218e-03],
                   [- 4.3196624923060242e-03, 3.6662365748484798e-03, 9.9998394948385538e-01]])
# Translation vector:
T = np.array([2.5031875059141302e-02, -2.9342312935846411e-04, 6.6238747008330102e-04])

Points in the RGB camera coordinate system are calculated as follows:

"""
  Convert the point from depth sensor 3D coordinate system
  to rgb camera coordinate system:
"""
[x_RGB, y_RGB, z_RGB] = np.linalg.inv(R).dot([x, y, z]) - np.linalg.inv(R).dot(T)

2. Using the intrinsic parameters of the RGB camera, we map it to the color image coordinate system [2]:

These are the indices into which to get the color pixels.

Note that in the previous equations, focal length and optical center are RGB camera parameters. Again, we use parameters from the NYU Deep V2 dataset :

# RGB camera intrinsic Parameters:
FX_RGB = 5.1885790117450188e+02
FY_RGB = 5.1946961112127485e+02
CX_RGB = 3.2558244941119034e+0
CY_RGB = 2.5373616633400465e+02

The index of the corresponding pixel is calculated as follows:

"""
  Convert from rgb camera coordinate system
  to rgb image coordinate system:
"""
j_rgb = int((x_RGB * FX_RGB) / z_RGB + CX_RGB + width / 2)
i_rgb = int((y_RGB * FY_RGB) / z_RGB + CY_RGB)

Let's put everything together and display the point cloud:

colors = []
pcd = []
for i in range(height):
    for j in range(width):
        """
            Convert the pixel from depth coordinate system
            to depth sensor 3D coordinate system
        """
        z = depth_image[i][j]
        x = (j - CX_DEPTH) * z / FX_DEPTH
        y = (i - CY_DEPTH) * z / FY_DEPTH

        """
            Convert the point from depth sensor 3D coordinate system
            to rgb camera coordinate system:
        """
        [x_RGB, y_RGB, z_RGB] = np.linalg.inv(R).dot([x, y, z]) - np.linalg.inv(R).dot(T)

        """
            Convert from rgb camera coordinates system
            to rgb image coordinates system:
        """
        j_rgb = int((x_RGB * FX_RGB) / z_RGB + CX_RGB + width / 2)
        i_rgb = int((y_RGB * FY_RGB) / z_RGB + CY_RGB)

        # Add point to point cloud:
        pcd.append([x, y, z])

        # Add the color of the pixel if it exists:
        if 0 <= j_rgb < width and 0 <= i_rgb < height:
            colors.append(rgb_image[i_rgb][j_rgb] / 255)
        else:
            colors.append([0., 0., 0.])
            
# Convert to Open3D.PointCLoud:
pcd_o3d = o3d.geometry.PointCloud()  # create a point cloud object
pcd_o3d.points = o3d.utility.Vector3dVector(pcd)
pcd_o3d.colors = o3d.utility.Vector3dVector(colors)
# Visualize:
o3d.visualization.draw_geometries([pcd_o3d])

Colored point clouds computed from RGB-D images

5. Code optimization

In this section, we describe how to optimize the code to be more efficient and suitable for real-time applications.

5.1 Point cloud

Computing a point cloud using nested loops is very time consuming. For a depth image with a resolution of 480×640 , it takes about 2.154 seconds to compute the point cloud on a machine with 8GB RAM and i7-4500 CPU .

To reduce computation time, nested loops can be replaced with vectorized operations, and computation time can be reduced to around 0.024 seconds :

# get depth resolution:
height, width = depth_im.shape
length = height * width
# compute indices:
jj = np.tile(range(width), height)
ii = np.repeat(range(height), width)
# rechape depth image
z = depth_im.reshape(length)
# compute pcd:
pcd = np.dstack([(ii - CX_DEPTH) * z / FX_DEPTH,
                 (jj - CY_DEPTH) * z / FY_DEPTH,
                 z]).reshape((length, 3))

We can also reduce the computation time to about 0.015 seconds by computing the constant once at the beginning :

# compute indices:
jj = np.tile(range(width), height)
ii = np.repeat(range(height), width)
# Compute constants:
xx = (jj - CX_DEPTH) / FX_DEPTH
yy = (ii - CY_DEPTH) / FY_DEPTH
# transform depth image to vector of z:
length = height * width
z = depth_image.reshape(height * width)
# compute point cloud
pcd = np.dstack((xx * z, yy * z, z)).reshape((length, 3))

4.2 Color point cloud

As for the colored point cloud, on the same machine, it takes about 36.263 seconds to execute the previous example . By applying vectorization, the running time is reduced to 0.722 seconds .

# compute indices:
jj = np.tile(range(width), height)
ii = np.repeat(range(height), width)

# Compute constants:
xx = (jj - CX_DEPTH) / FX_DEPTH
yy = (ii - CY_DEPTH) / FY_DEPTH

# transform depth image to vector of z:
length = height * width
z = depth_image.reshape(length)

# compute point cloud
pcd = np.dstack((xx * z, yy * z, z)).reshape((length, 3))
cam_RGB = np.apply_along_axis(np.linalg.inv(R).dot, 1, pcd) - np.linalg.inv(R).dot(T)
xx_rgb = ((cam_RGB[:, 0] * FX_RGB) / cam_RGB[:, 2] + CX_RGB + width / 2).astype(int).clip(0, width - 1)
yy_rgb = ((cam_RGB[:, 1] * FY_RGB) / cam_RGB[:, 2] + CY_RGB).astype(int).clip(0, height - 1)
colors = rgb_image[yy_rgb, xx_rgb]/255

6. Conclusion

In this tutorial, we learned how to compute point clouds from RGB-D data. In the next tutorial , we will take a simple ground detection example and carefully analyze the point cloud.

Thanks and I hope you enjoyed reading this article. You can find examples in my GitHub repository .

quote

[1] Zhang, S., & Huang, PS (2006). A new method for calibration of structured light systems. Optical Engineering , 45 (8), 083601.

[2] Zhou Xu, (2012). Research on Microsoft Kinect Calibration. Department of Computer Science, George Mason University, Fairfax .

[Point Cloud Processing Tutorial] 02 Estimating Point Clouds from Depth Images in Python