1 Overview
Machine vision is to use machines instead of human eyes and brains for measurement and judgment. The basic process of the machine vision system is to obtain the image of the target, perform analysis operations such as recognition, feature extraction, classification, and mathematical operations on the image, and control or make decisions on the corresponding system according to the analysis and calculation results of the image.
In many machine vision applications, machine vision measurement is required, that is, to obtain the physical position of the target in the actual space according to the image of the target, such as grabbing manipulators, walking robots, SLAM, etc.
To obtain the physical space position of the target according to the target pixel position in the image, we need to first have a mapping relationship between the image pixel coordinates and the physical space coordinates, that is, abstract the optical imaging process into a mathematical formula, which can express the spatial position The mathematical formula of how to map to the pixel position of the image is the so-called machine vision imaging model. This article discusses the mechanism of this model.
2 small hole imaging
Machine vision imaging adopts a small hole imaging model, as shown in the following figure
Figure 1:
Simplified again to the following figure
Figure 2: XX
in the figureX is a point in space,xxx is the imaging point of the space point in the image,CCC is the optical center of the lens (camera centre), as can be seen from the figure,CCC、 x x x、 X X The three points of X are collinear.
Optical centerCCThe distance from C to the image plane is the focal lengthfff .
The following coordinate systems and their interrelationships are derived based on this pinhole imaging model.
3 coordinate system
When it comes to machine vision measurement models, it is necessary to first understand several coordinate systems involved in the entire model.
3.1 Pixel coordinate system uov
That is, the coordinate system where each pixel point in the image sits, as shown in the figure below uov.
Figure 3:
This coordinate system is a two-dimensional coordinate system, the abscissa is the image width direction, the ordinate is the image height direction, the origin is in the upper left corner, and the coordinate axis unit is pixel, which corresponds to the pixel point of the image.
3.2 Image coordinate system xoy
That is, the image sensor (such as CMOS, CCD) coordinate system, as shown in the figure below xoy.
Figure 4:
This coordinate system is also a two-dimensional coordinate system, the abscissa is the width direction of the sensor, the ordinate is the height direction of the sensor, the origin is at the center of the sensor, and the unit of the coordinate axis is mm (set according to actual needs, m, mm, ... …), and the following coordinate systems are also in the same unit, which will not be explained here.
Combined with the pixel coordinate system, we can get the following figure Figure
5:
From this figure, we can get the mapping relationship between the pixel coordinate system uov and the image coordinate system xoy, namely:
u = x / dx + u 0 v = − y / dy + v 0 u=x/dx+u_0\\ v=-y/dy+v0u=x/dx+u0v=− y / d y+v 0
in formula:
u 0 u_0u0、v 0 v_0v0——The pixel coordinates of the center of the image (usually half of the horizontal and vertical resolution of the image, but if the position of the lens and the sensor is misaligned, it is not half), the unit is pixel; dx
dxdx、 d y dy d y ——The horizontal and vertical dimensions of the sensor unit (that is, the pixel size), in mm/pixel, usually the pixel is a square, sodx = dy dx=dydx=d y
The above formula is written in the form of a homogeneous matrix, which isthe conversion relationship between the pixel coordinate system and the image coordinate system
[ uv 1 ] = [ 1 / dx 0 u 0 0 − 1 / dyv 0 0 0 1 ] [ xy 1 ] \left[ \begin{matrix} u\\v\\1 \end{matrix}\right]= \left[\begin{matrix} 1/dx&0&u_0\\0&-1/dy&v_0\\0&0&1 \end{matrix}\right] \left[\begin{matrix} x\\y\\1 \end{matrix}\right]⎣⎡uv1⎦⎤=⎣⎡1/dx000−1/dy0u0v01⎦⎤⎣⎡xy1⎦⎤
Note that a negative sign is added to the y direction in the above formula, because the pixel coordinate system uov is a left-handed coordinate system, but the three-dimensional coordinate system to be discussed later uses a right-handed coordinate system, so here the image coordinate system xoy is directly set to the right-handed coordinate system The negative sign is used to transform the y-axis direction.
3.3 Camera coordinate system OCXCYCZC O_CX_CY_CZ_COCXCYCZC
Set a three-dimensional coordinate system on the camera lens, as shown in the figure below, the origin is at the optical center, the X-axis and Y-axis are parallel to the x-axis and y-axis of the image coordinate system respectively, and the Z-axis points to the object side (Note: This is what I am used to One camera coordinate system definition, another more common one is that Z points to the object space, and XY points to the opposite of what I have in the picture. The advantage is that there are no two negative signs of my internal reference in the internal reference).
Figure 6:
According to the above small hole imaging model, we can get the projection relationship in the YOZ (YCZ) plane, as shown in the figure below (XOZ plane is the same)
Figure 7:
In the above figure, according to similar triangles, f ZC = y YC \ frac{f}{Z_C}=\frac{y}{Y_C}ZCf=YCy以及 f Z C = x X C \frac{f}{Z_C}=\frac{x}{X_C} ZCf=XCx, so we can write the conversion relationship between the camera coordinate system and the image coordinate system , we directly write it as the homogeneous coordinate form
[ xy 1 ] = [ f / ZC 0 0 0 0 f / ZC 0 0 0 0 1 / ZC 0 ] [ XCYCZC 1 ] = 1 ZC [ f 0 0 0 0 f 0 0 0 0 1 0 ] [ XCYCZC 1 ] \left[\begin{matrix} x\\y\\1 \end{matrix}\right]= \left [\begin{matrix} f/Z_C&0&0&0\\0&f/Z_C&0&0\\0&0&1/Z_C&0 \end{matrix}\right] \left[\begin{matrix} X_C\\Y_C\\Z_C\\1 \end{matrix} \right]=\frac{1}{Z_C}\left[\begin{matrix} f&0&0&0\\0&f&0&0\\0&0&1&0 \end{matrix}\right] \left[\begin{matrix} X_C\\Y_C\\Z_C \\1 \end{matrix}\right]⎣⎡xy1⎦⎤=⎣⎡f/ZC000f/ZC0001/ZC000⎦⎤⎣⎢⎢⎡XCYCZC1⎦⎥⎥⎤=ZC1⎣⎡f000f0001000⎦⎤⎣⎢⎢⎡XCYCZC1⎦⎥⎥⎤
Where: fff - the focal length of the lens, some formulas in the literature will divide the focal length into fx f_xin the X and Y directionsfx、 f y f_y fy
Determine the equivalent of the equivalent equation of the equation
[ uv 1 ] = [ 1 / dx 0 u 0 0 − 1 / dyv 0 0 0 1 ] [ xy 1 ] = 1 ZC [ 1 / dx 0 u 0 0 − 1 / dyv 0 0 0 1 ] [ f 0 0 0 0 f 0 0 0 0 1 0 ] [ XCYCZC 1 ] \left[\begin{matrix} u\\v\\1 \end{matrix}\right] = \left[\begin{matrix} 1/dx&0&u_0\\0&-1/dy&v_0\\0&0&1 \end{matrix}\right] \left[\begin{matrix} x\\y\\1 \end{matrix}; \right]=\frac{1}{Z_C} \left[\begin{matrix} 1/dx&0&u_0\\0&-1/dy&v_0\\0&0&1 \end{matrix}\right] \left[\begin{matrix} f&0&0&0 \\0&f&0&0\\0&0&1&0 \end{matrix}\right] \left[\begin{matrix} X_C\\Y_C\\Z_C\\1 \end{matrix}\right]⎣⎡uv1⎦⎤=⎣⎡1/dx000−1/dy0u0v01⎦⎤⎣⎡xy1⎦⎤=ZC1⎣⎡1/dx000−1/dy0u0v01⎦⎤⎣⎡f000f0001000⎦⎤⎣⎢⎢⎡XCYCZC1⎦⎥⎥⎤
We can get the conversion relationship between the camera coordinate system and the pixel coordinate system as follows
[ uv 1 ] = 1 ZC [ f / dx 0 u 0 0 0 − f / dyv 0 0 0 0 1 0 ] [ XCYCZC 1 ] \left[\begin {matrix} u\\v\\1 \end{matrix}\right]=\frac{1}{Z_C} \left[\begin{matrix} f/dx&0&u_0&0\\0&-f/dy&v_0&0\\0&0&1&0 \end {matrix}\right] \left[\begin{matrix} X_C\\Y_C\\Z_C\\1 \end{matrix}\right]⎣⎡uv1⎦⎤=ZC1⎣⎡f/dx000− f / d y0u0v01000⎦⎤⎣⎢⎢⎡XCYCZC1⎦⎥⎥⎤
We use M 1 M_1M1Represents the matrix in the formula. At the same time, we can see from Figure 6 above that the image is actually observed from the back of the image sensor in the figure. Therefore, a negative sign needs to be added to the X direction, and M 1 = [ − f / dx 0
u 0 0 0 − f / dyv 0 0 0 0 1 0 ] M_1=\left[\begin{matrix} -f/dx&0&u_0&0\\0&-f/dy&v_0&0\\0&0&1&0 \end{matrix}\right]M1=⎣⎡−f/dx000− f / d y0u0v01000⎦⎤
The parameters in this matrix are only related to the lens focal length fff , pixel sizedxdy dxdyd x d y , center pixelu 0 v 0 u_0v_0u0v0Related, these are the internal parameters of the camera and lens. After the camera and lens are determined, the matrix is determined, so it is called the internal reference matrix .
3.4 OWXWYWZW O_WX_WY_WZ_WOWXWYWZW
The world coordinate system is the absolute coordinate system of the system, and it is also a three-dimensional coordinate system. The origin and coordinate axis directions are selected according to our needs.
Figure 8:
As a rigid body, the camera has a pose in the world coordinate system - position and pose, the position is the translation of the camera (the origin of the camera coordinate system) relative to the origin of the world coordinate system, using a 3×1 translation vector TC T_CTCExpression, attitude is the rotation of the camera (camera coordinate system) relative to the world coordinate system, using a 3×3 rotation matrix RC R_CRCExpression
Then we can get the relationship between the camera coordinate system and the world coordinate system
[ XWYWZW 1 ] = [ RCTC 0 1 × 3 1 ] [ XCYCZC 1 ] \left[\begin{matrix} X_W\\Y_W\\Z_W\\1 \end{matrix}\right]= \left[\begin{matrix} R_C&T_C\\0_{1×3}&1 \end{matrix}\right] \left[\begin{matrix} X_C\\Y_C\\Z_C \\1 \end{matrix}\right]⎣⎢⎢⎡XWYWZW1⎦⎥⎥⎤=[RC01×3TC1]⎣⎢⎢⎡XCYCZC1⎦⎥⎥⎤
反过来
[ X C Y C Z C 1 ] = [ R C T C 0 1 × 3 1 ] − 1 [ X W Y W Z W 1 ] = M 2 [ X W Y W Z W 1 ] \left[\begin{matrix} X_C\\Y_C\\Z_C\\1 \end{matrix}\right]= \left[\begin{matrix} R_C&T_C\\0_{1×3}&1 \end{matrix}\right]^{-1} \left[\begin{matrix} X_W\\Y_W\\Z_W\\1 \end{matrix}\right]=M_2\left[\begin{matrix} X_W\\Y_W\\Z_W\\1 \end{matrix}\right] ⎣⎢⎢⎡XCYCZC1⎦⎥⎥⎤=[RC01×3TC1]−1⎣⎢⎢⎡XWYWZW1⎦⎥⎥⎤=M2⎣⎢⎢⎡XWYWZW1⎦⎥⎥⎤
This is the transformation relationship between the world coordinate system and the camera coordinate system , where the matrix M 2 M_2M2It is related to the pose of the camera and is called the extrinsic parameter matrix .
Substituting the previous conversion formula between the pixel coordinate system and the camera coordinate system, we get
[ uv 1 ] = 1 ZC [ − f / dx 0 u 0 0 0 − f / dyv 0 0 0 0 1 0 ] [ XCYCZC 1 ] = 1 ZC [ − f / dx 0 u 0 0 0 − f / dyv 0 0 0 0 1 0 ] M 2 [ XWYWZW 1 ] \left[\begin{matrix} u\\v\\1 \end{matrix}\right] =\frac{1}{Z_C} \left[\begin{matrix} -f/dx&0&u_0&0\\0&-f/dy&v_0&0\\0&0&1&0 \end{matrix}\right] \left[\begin{matrix} X_C\\ Y_C\\Z_C\\1 \end{matrix}\right]=\frac{1}{Z_C} \left[\begin{matrix} -f/dx&0&u_0&0\\0&-f/dy&v_0&0\\0&0&1&0 \end{matrix }\right]M_2 \left[\begin{matrix} X_W\\Y_W\\Z_W\\1 \end{matrix}\right]⎣⎡uv1⎦⎤=ZC1⎣⎡−f/dx000− f / d y0u0v01000⎦⎤⎣⎢⎢⎡XCYCZC1⎦⎥⎥⎤=ZC1⎣⎡−f/dx000− f / d y0u0v01000⎦⎤M2⎣⎢⎢⎡XWYWZW1⎦⎥⎥⎤
4 Machine Vision Projection Matrix
So far, we have obtained the mapping relationship between the pixel coordinate system and the world coordinate system , that is, the machine vision projection matrix
[ uv 1 ] = 1 ZCM 1 M 2 [ XWYWZW 1 ] \left[\begin{matrix} u\\v\\ 1 \end{matrix}\right]=\frac{1}{Z_C} M_1M_2 \left[\begin{matrix} X_W\\Y_W\\Z_W\\1 \end{matrix}\right]⎣⎡uv1⎦⎤=ZC1M1M2⎣⎢⎢⎡XWYWZW1⎦⎥⎥⎤
Where:
ZC Z_CZC——Z coordinate M 1 M_1 of the space point in the camera coordinate system
M1——Horizontal matrix,3×4 matrix, M 1 = [ − f / dx 0 u 0 0 0 − f / div 0 0 0 0 1 0 ] M_1=\left[\begin{matrix} -f/dx&0&u_0&0\\; 0&-f/dy&v_0&0\\0&0&1&0 \end{matrix}\right]M1=⎣⎡−f/dx000− f / d y0u0v01000⎦⎤
M 2 M_2 M2——Specialized square,4×4 square, M 2 = [ r 11 r 12 r 13 txr 21 r 22 r 23 tyr 31 r 32 r 33 tz 0 0 0 1 ] M_2=\left[\begin{matrix} r_]. {11}&r_{12}&r_{13}&t_x\\ r_{21}&r_{22}&r_{23}&t_y\\ r_{31}&r_{32}&r_{33}&t_z\\ 0&0&0&1 \end{matrix} \right]M2=⎣⎢⎢⎡r11r21r310r12r22r320r13r23r330txtytz1⎦⎥⎥⎤
After the model is established, how to obtain the parameters in it? This involves the next question: "Machine Vision-Camera Calibration"