PBRT_V2 总结记录 <16> Coordinate Spaces Transform（空间坐标变换）

在之前《PBRT_V2 总结记录 <15> Transform 和 Coordinate Spaces》中有说过 PBRT 中的各种空间，这次主要记录某一些空间之间的变换是怎么进行的，主要是考虑透视投影。

1. CameraToScreen （ Camera space To Screen space, 注意这里的 Screen space 并不是屏幕空间，可以理解为类似OpenGL的裁剪空间，但是在Screen space中，Z值变换到已经是 [0,1]）

注意：

（把 Z 变换到 [0,1] 之间，对于X，Y变换，X ：【-ar，ar】, Y：【-1, 1】）

Points p in camera space are projected onto the viewing plane. A bit of algebra(代数)
shows that the projected x‘ and y’ coordinates on the viewing plane can be computed
by dividing x and y by the point’s z coordinate value. The projected z depth
is remapped so that z values at the hither plane are 0 and z values at the yon plane
are 1.

The angular field of view (fov) specified by the user is accounted for by scaling the
(x, y) values on the projection plane so that points inside the field of view project
to coordinates between [−1, 1]on the view plane. For square images, both x and y
lie between [−1, 1] in screen space. Otherwise, the direction in which the image is
narrower(窄) maps to [−1, 1] and the wider direction maps to a proportionally larger
range of screen space values. Recall that the tangent（tan 三角函数） is equal to the ratio of the
opposite side of a right triangle to the adjacent side. Here the adjacent side has
length 1, so the opposite side has the length tan(fov/2). Scaling by the reciprocal
of this length maps the field of view to range from [−1, 1]

细节：

PerspectiveCamera::PerspectiveCamera(const AnimatedTransform &cam2world,
	const float screenWindow[4], float sopen, float sclose,
	float lensr, float focald, float fov, Film *f)
	: ProjectiveCamera(cam2world, Perspective(fov, 1e-2f, 1000.f),
	screenWindow, sopen, sclose, lensr, focald, f) {

	// Compute differential changes in origin for perspective camera rays
	dxCamera = RasterToCamera(Point(1, 0, 0)) - RasterToCamera(Point(0, 0, 0));
	dyCamera = RasterToCamera(Point(0, 1, 0)) - RasterToCamera(Point(0, 0, 0));
}


Transform Perspective(float fov, float n, float f) {
	// Perform projective divide
	Matrix4x4 persp = Matrix4x4(1, 0, 0, 0,
		0, 1, 0, 0,
		0, 0, f / (f - n), -f*n / (f - n),
		0, 0, 1, 0);

	// Scale to canonical viewing volume
	float invTanAng = 1.f / tanf(Radians(fov) / 2.f);
	return Scale(invTanAng, invTanAng, 1) * Transform(persp);
}

看代码发现，在 PerspectiveCamera 的构造函数中，CameraToScreen 其实是直接由 Perspective(fov, 1e-2f, 1000.f) 函数生成，主要看 Perspective 函数就可以了。

a. 看到上的 Perspective 函数其实得到的矩阵就是，

1/ tan(fov / 2) 1 0 0

0 1/ tan(fov / 2) 0 0

0 0 f/(f-n) -f*n / (f-n)

0 0 1 0

第1,2行主要是变换 X，Y 坐标的，可以参考 http://ogldev.atspace.co.uk/www/tutorial12/tutorial12.html

假设 Image Plane 的宽高比（W/H）用 ar 来表示，宽比高要长，Image Plane 高为2的时候，

那么经过上面的第1,2 行变换之后得到的 X' : [-ar, ar], Y' : [-1,1 ]

b.扩展：

1/tan(fov / 2) * 1/ ar 1 0 0

0 1/tan(fov / 2) * 1/ ar 0 0

0 0 f/(f-n) -f*n / (f-n)

0 0 1 0

（参考 http://ogldev.atspace.co.uk/www/tutorial12/tutorial12.html , 这里做扩展

tan(fov/2) = (1/ar) / d

d = (1/ar) * 1/tan(fov/2)

）

假设 Image Plane 的宽高比（W/H）用 ar 来表示，宽比高要短，Image Plane 宽为2的时候，

那么经过上面的第1,2 行变换之后得到的 X' : [-1, 1], Y' : [-1 / ar, 1 / ar ]

c. 需要注意一点的就是，在进行矩阵变换的时候，得到的结果要除以自己的W（类似OpenGL的透视除法）

inline void Transform::operator()(const Point &pt,
	Point *ptrans) const {
	float x = pt.x, y = pt.y, z = pt.z;
	ptrans->x = m.m[0][0] * x + m.m[0][1] * y + m.m[0][2] * z + m.m[0][3];
	ptrans->y = m.m[1][0] * x + m.m[1][1] * y + m.m[1][2] * z + m.m[1][3];
	ptrans->z = m.m[2][0] * x + m.m[2][1] * y + m.m[2][2] * z + m.m[2][3];
	float w = m.m[3][0] * x + m.m[3][1] * y + m.m[3][2] * z + m.m[3][3];
	if (w != 1.) *ptrans /= w;
}

总结一下，也就是说，CameraToScreen 变换Camera Space 到 Screen Space 中

Screen Space ：

X ：【-ar，ar】

Y ：【-1，1】

Z ：【0，1】

2. ScreenToRaster （ Screen space To Raster space, 注意这里的 Screen space 并不是屏幕空间，可以理解为类似OpenGL的裁剪空间，但是在Screen space中，Z值变换到已经是 [0,1]， Raster 才是正常里面的屏幕空间，或者是图片）

（Screen Space 的一个点，先进行平移，使得屏幕的右下角是原点，之后再把 X，Y 缩放到 [0,1] 之间，这里其实就是变换到NDC空间, 最后 [0,1] -> 屏幕的宽高,(注意，Y是被翻转了)）

The only nontrivial(重要) transformation to compute in the constructor is the screen-to-raster
projection. In the following code, note the composition of transformations where (reading
from bottom to top), we start with a point in screen space, translate so that the

upper-left corner of the screen is at the origin, and then scale by the reciprocal of the
screen width and height, giving us a point with x and y coordinates between zero and
one (these are NDC coordinates). Finally, we scale by the raster resolution, so that we
end up covering the entire raster range from (0, 0) up to the overall raster resolution.
An important detail here is that the y coordinate is inverted by this transformation; this
is necessary because increasing y values move up the image in screen coordinates, but
down in raster coordinates.

细节：

ProjectiveCamera::ProjectiveCamera(const AnimatedTransform &cam2world,
	const Transform &proj, const float screenWindow[4], float sopen,
	float sclose, float lensr, float focald, Film *f)
	: Camera(cam2world, sopen, sclose, f) {
	// Initialize depth of field parameters
	lensRadius = lensr;
	focalDistance = focald;

	// Compute projective camera transformations
	CameraToScreen = proj;

	// Compute projective camera screen transformations
	ScreenToRaster = Scale(float(film->xResolution),
		float(film->yResolution), 1.f) *
		Scale(1.f / (screenWindow[1] - screenWindow[0]),
			1.f / (screenWindow[2] - screenWindow[3]), 1.f) *
		Translate(Vector(-screenWindow[0], -screenWindow[3], 0.f));

	RasterToScreen = Inverse(ScreenToRaster);
	RasterToCamera = Inverse(CameraToScreen) * RasterToScreen;
}

最主要的代码就是：

ScreenToRaster = Scale(float(film->xResolution), float(film->yResolution), 1.f) *
Scale(1.f / (screenWindow[1] - screenWindow[0]), 1.f / (screenWindow[2] - screenWindow[3]), 1.f) *
Translate(Vector(-screenWindow[0], -screenWindow[3], 0.f));

先执行 Translate(Vector(-screenWindow[0], -screenWindow[3], 0.f));，

再执行 Scale(1.f / (screenWindow[1] - screenWindow[0]), 1.f / (screenWindow[2] - screenWindow[3]), 1.f)

上面两个执行，其实就是把 Screen Space 变换到 NDC Space 中了

最后执行 Scale(float(film->xResolution), float(film->yResolution), 1.f)

这个就是 NDC Space 变换到 Raster Space

b.screenWindow 参数是什么东西

在 CreatePerspectiveCamera 方法中有：

float frame = params.FindOneFloat("frameaspectratio",
		float(film->xResolution) / float(film->yResolution));

	float screen[4];
	if (frame > 1.f) {
		screen[0] = -frame;
		screen[1] = frame;
		screen[2] = -1.f;
		screen[3] = 1.f;
	}
	else {
		screen[0] = -1.f;
		screen[1] = 1.f;
		screen[2] = -1.f / frame;
		screen[3] = 1.f / frame;
	}

其实，这里对应的 frame 就是宽高比 ar，这里有判断， ar 大于1 和小于1 的情况，或者的 screenWindow 是不一样的。

ar 大于1 的时候， screenWindow = （-ar， ar， -1， 1）

ar 小于1的时候， screenWindow = （-1， 1， -1/ar， 1/ar）

c. ScreenToNDC ：

根据上面的 CameraToScreen 得到矩阵（用 A 来表示）是：

1/ tan(fov / 2) 1 0 0

0 1/ tan(fov / 2) 0 0

0 0 f/(f-n) -f*n / (f-n)

0 0 1 0

情况1，假设 ar 是大于1的，那么 screenWindow = （-ar， ar， -1， 1）

Translate(Vector(-screenWindow[0], -screenWindow[3], 0.f)); : T(ar , -1, 0)

T(平移) * A，得到的矩阵组合就是

1/ tan(fov / 2) 1 0 ar

0 1/ tan(fov / 2) 0 -1

0 0 f/(f-n) -f*n / (f-n)

0 0 1 0

Scale(1.f / (screenWindow[1] - screenWindow[0]), 1.f / (screenWindow[2] - screenWindow[3]), 1.f) = （S(1 / 2ar，-1 / 2， 1 )）

S1(缩放1) * T * A 得到的矩阵组合就是

（1/2ar ）* 1/ tan(fov / 2) 1 0 ar

0 （-1/2 ）* 1/ tan(fov / 2) 0 -1

0 0 f/(f-n) -f*n / (f-n)

0 0 1 0

X坐标， X 在Screen Space 的坐标是【-ar, ar】, 已经平移之后【0，2ar】, 再经过 "缩放1" 之后【0，1】

Y 坐标，Y 在Screen Space 的坐标是【-1, 1】, 已经平移之后【-2，0】, 再经过 "缩放1" 之后【1，0】

情况2，假设 ar 是小于 1的，那么 screenWindow = （-1， 1， -1/ar， 1/ar）

Translate(Vector(-screenWindow[0], -screenWindow[3], 0.f)); : T(1 , -1/ar, 0)

T(平移) * A，得到的矩阵组合就是

1/ tan(fov / 2) 1 0 1

0 1/ tan(fov / 2) 0 -1/ar

0 0 f/(f-n) -f*n / (f-n)

0 0 1 0

Scale(1.f / (screenWindow[1] - screenWindow[0]), 1.f / (screenWindow[2] - screenWindow[3]), 1.f) = （S(1 / 2，-2 / ar， 1 )）

S1(缩放1) * T * A 得到的矩阵组合就是

（1/2 ）* 1/ tan(fov / 2) 1 0 1

0 （-ar/2）* 1/ tan(fov / 2) 0 -1/ar

0 0 f/(f-n) -f*n / (f-n)

0 0 1 0

X坐标， X 在Screen Space 的坐标是【-ar, ar】, 已经平移之后【-ar + 1，ar + 1】,

再经过 "缩放1" 之后【(-ar + 1)/2，(ar + 1) /2】

Y 坐标，Y 在Screen Space 的坐标是【-1, 1】, 已经平移之后【-1 + (-1/ar)，1 + (-1/ar)】,

再经过 "缩放1" 之后【(-1 + (-1/ar)) * (-ar/2)，(1 + (-1/ar)) * -ar/2)】

得到的变换后的 X，Y 都不是【0,1】的，

当 ar < 1 的时候，PBRT渲染出来的图片是有黑边的，我怀疑是不是PBRT 有BUG，例如：

d. 进一步验证，若把ScreenToRaster 的代码变成

float ar = ((float)f->xResolution / (float)f->yResolution);
	// Compute projective camera screen transformations
	ScreenToRaster = Scale(float(film->xResolution),
		float(film->yResolution), 1.f) *
		Scale(1.f / (screenWindow[1] - screenWindow[0]),
			1.f / (screenWindow[2] - screenWindow[3]), 1.f) *
		Translate(Vector(-screenWindow[0], -screenWindow[3], 0.f))*
		Scale(1/ar, 1/ar, 1);

其实添加了 Scale(1/ar, 1/ar, 1) 得到的PBRT的渲染结果就是：（为什么加 Scale(1/ar, 1/ar, 1) 可以参看前面的 CameraToScreen 的扩展一下）

所以，PBRT 在处理 ar < 1 的时候，应该是算错了。

那么再一次计算

情况2，假设 ar 是小于 1的，那么 screenWindow = （-1， 1， -1/ar， 1/ar）

Translate(Vector(-screenWindow[0], -screenWindow[3], 0.f)); : T(1 , -1/ar, 0)

T(平移) * A，得到的矩阵组合就是

1/ tan(fov / 2) * 1/ ar 1 0 1

0 1/ tan(fov / 2) * 1/ ar 0 -1/ar

0 0 f/(f-n) -f*n / (f-n)

0 0 1 0

Scale(1.f / (screenWindow[1] - screenWindow[0]), 1.f / (screenWindow[2] - screenWindow[3]), 1.f) = （S(1 / 2，-2 / ar， 1 )）

S1(缩放1) * T * A 得到的矩阵组合就是

（1/2 ）* 1/ tan(fov / 2) * 1/ ar 1 0 1

0 （-ar/2）* 1/ tan(fov / 2) * 1/ ar 0 -1/ar

0 0 f/(f-n) -f*n / (f-n)

0 0 1 0

X坐标， X 在Screen Space 的坐标是【-ar, ar】, 先经过缩放（Scale(1/ar, 1/ar, 1)），【-1,1】，平移之后【0，2】,

再经过 "缩放1" 之后【0，1】

Y 坐标，Y 在Screen Space 的坐标是【-1, 1】, 先经过缩放（Scale(1/ar, 1/ar, 1)），，【-1/ar, 1/ar】，平移之后【-2/ar, 0】,

再经过 "缩放1" 之后【1, 0)】

e.结论：

PBRT 中处理 ar < 1 的时候出现bug了，上面验证在 ar < 1 的时候，需要自己添加 Scale(1/ar, 1/ar, 1) 来纠正，最根本的原因就是，宽高比 ar

ar 大于1 的时候， screenWindow = （-ar， ar， -1， 1）

ar 小于1的时候， screenWinow = （-1， 1， -1/ar， 1/ar）

但是，看 CameraToScreen 的矩阵：

1/ tan(fov / 2) 1 0 0

0 1/ tan(fov / 2) 0 0

0 0 f/(f-n) -f*n / (f-n)

0 0 1 0

经过这个矩阵的变换的Point ，X ：【-ar, ar】，Y：【-1, 1】, Z：【0, 1】

但是，如果你的 screenWinow 参数不能将 X Y 变成【0,1】【1,0】，那么就出问题了。

所以，最直接的方案就，不用自己添加 Scale(1/ar, 1/ar, 1)，而是不需要判断 ar 与 1 的小大情况，同意使用

screenWindow = （-ar， ar， -1， 1）

f. NDCToRaster

这个操作就是就是 Scale(float(film->xResolution), float(film->yResolution), 1.f)

就是把

X 【0,1】 =》【0，film->xResolution】

Y [1, 0] =》【film->yResolution， 0】

PBRT_V2 总结记录 <16> Coordinate Spaces Transform（空间坐标变换）

猜你喜欢