Real-Time Rendering——16.6 Compression and Precision压缩和精度

Triangle mesh data can have its data compressed in various ways, and can accrue similar benefits. Just as PNG and JPEG image file formats use lossless and lossy compression for textures, a variety of algorithms and formats have been developed for the compression of triangle mesh data.

三角形网格数据可以以各种方式压缩其数据，并且可以产生类似的好处。正如PNG和JPEG图像文件格式对纹理使用无损和有损压缩一样，已经开发了各种算法和格式来压缩三角形网格数据。

Compression minimizes the room spent for data storage, at the cost of time spent encoding and decoding. The time saved by transferring a smaller representation must outweigh the extra time spent decompressing the data. When transmitted on the Internet, a slow download speed implies that more elaborate algorithms can be used.Mesh connectivity can be compressed and efficiently decoded using TFAN [1116],adopted in MPEG-4. Encoders such as Open3DGC, OpenCTM, and Draco can create model files that can be one fourth of the size or smaller compared to using only gzip compression [1335]. Decompression using these schemes is meant to be a one-time operation, something that is relatively slow—a few million triangles per second—but that can more than pay for itself by saving on time spent transmitting data. Maglo et al. [1099] provide a thorough review of algorithms. Here we focus on compression techniques directly involving the GPU itself.

压缩以编码和解码所花费的时间为代价，最小化了用于数据存储的空间。传输较小的表示所节省的时间肯定超过解压缩数据所花费的额外时间。在互联网上传输时，缓慢的下载速度意味着可以使用更复杂的算法。可以使用MPEG-4中采用的TFAN [1116]来压缩和有效解码网格连通性。与仅使用gzip压缩相比，Open3DGC、OpenCTM和Draco等编码器可以创建四分之一大小或更小的模型文件[1335]。使用这些方案的解压缩意味着是一次性的操作，相对较慢——每秒几百万个三角形——但通过节省传输数据的时间，这完全可以收回成本。Maglo等人【1099】对算法进行了全面的回顾。在这里，我们重点关注直接涉及GPU本身的压缩技术。

Much of this chapter has been devoted to various ways in which a triangle mesh’s storage is minimized. The major motivation for doing so is rendering efficiency.Reusing, versus repeating, vertex data among several triangles will lead to fewer cache misses. Removing triangles that have little visual impact saves both vertex processing and memory. A smaller memory size leads to lower bandwidth costs and better cache use. There are also limits to what the GPU can store in memory, so data reduction techniques can lead to more triangles that can be displayed.

这一章的大部分时间都在讨论最小化三角形网格存储的各种方法。这样做的主要动机是渲染效率。在几个三角形之间重用而不是重复顶点数据将导致更少的缓存未命中。删除对视觉影响不大的三角形可以节省顶点处理和内存。内存越小，带宽成本越低，缓存利用率越高。GPU可以在内存中存储的内容也有限制，因此数据简化技术可以导致更多的三角形可以显示。

Vertex data can be compressed using fixed-rate compression, for similar reasons as when textures are compressed (Section 6.2.6). By fixed-rate compression we mean methods in which the final compressed storage size is known. Having a self-contained form of compression for each vertex means that decoding can happen on the GPU.Calver [221] presents a variety of schemes that use the vertex shader for decompression.Zarge [1961] notes that data compression can also help align vertex formats to cache lines. Purnomo et al. [1448] combine simplification and vertex quantization techniques,and optimize the mesh for a given target mesh size, using an image-space metric.

顶点数据可以使用固定速率压缩来压缩，原因与纹理压缩时相似(6.2.6节)。固定速率压缩是指最终压缩存储大小已知的方法。每个顶点都有独立的压缩形式，这意味着解码可以在GPU上进行。Calver [221]提出了多种使用顶点着色器进行解压缩的方案。Zarge [1961]指出，数据压缩也有助于将顶点格式与缓存行对齐。Purnomo等人【1448】结合了简化和顶点量化技术，并使用图像空间度量对给定目标网格尺寸的网格进行优化。

One simple form of compression is found within the index buffer’s format. An index buffer consists of an array of unsigned integers that give the array positions for vertices in the vertex buffer. If there are less than or equal to 216 vertices in the vertex buffer, then the index buffer can use unsigned shorts instead of unsigned longs. Some APIs support unsigned bytes for meshes with less than 28 vertices, but using these can cause costly alignment issues, so are generally avoided. It is worth noting that OpenGL ES 2.0, unextended WebGL 1.0, and some older desktop and laptop GPUs have a limitation that unsigned long index buffers are not supported, so unsigned shorts must be used.

在索引缓冲区的格式中可以找到一种简单的压缩形式。索引缓冲区由一个无符号整数数组组成，该数组给出顶点在顶点缓冲区中的位置。如果顶点缓冲区中的顶点少于或等于216个，那么索引缓冲区可以使用无符号短整型来代替无符号长整型。对于少于28个顶点的网格，一些API支持无符号字节，但是使用这些会导致代价高昂的对齐问题，所以通常要避免。值得注意的是，OpenGL ES 2.0、未扩展的WebGL 1.0以及一些较老的台式机和笔记本电脑GPU都有一个限制，即不支持无符号长索引缓冲区，所以必须使用无符号短索引。

The other compression opportunity is with triangle mesh data itself. As a basic example, some triangle meshes store one or more colors per vertex to represent bakedin lighting, simulation results, or other information. On a typical monitor a color is represented by 8 bits of red, green, and blue, so the data could be stored in the vertex record as three unsigned bytes instead of three floats. The GPU’s vertex shader can turn this field into separate values that are then interpolated during triangle traversal. However, care should be taken on many architectures. For example, Apple recommends on iOS padding 3-byte data fields to 4 bytes to avoid extra processing [66].See the middle illustration in Figure 16.23.

另一个压缩机会是三角形网格数据本身。作为一个基本示例，一些三角形网格在每个顶点存储一种或多种颜色，以表示烘焙照明、模拟结果或其他信息。在典型的监视器上，一种颜色由8位红、绿、蓝表示，因此数据可以作为三个无符号字节而不是三个浮点数存储在顶点记录中。GPU的顶点着色器可以将该字段转换为单独的值，然后在三角形遍历期间进行插值。然而，许多架构都需要小心。例如，苹果建议在iOS上将3字节的数据字段填充为4字节，以避免额外的处理[66]。参见图16.23中的中间插图。

Figure 16.23. Typical fixed-rate compression methods for vertex data. (Octant conversion figure from Cigolle et al. [269], courtesy of Morgan McGuire.)

图16.23。顶点数据的典型固定速率压缩方法。(八分转换图来自Cigolle等人[269]，由Morgan McGuire提供。)

Another compression method is to not store any color at all. If the color data are, say, showing temperature results, the temperature itself can be stored as a single number that is then converted to an index in a one-dimensional texture for a color.Better yet, if the temperature value is not needed, then a single unsigned byte could be used to reference this color texture.

另一种压缩方法是根本不存储任何颜色。如果颜色数据显示温度结果，温度本身可以存储为一个数字，然后转换为颜色的一维纹理中的索引。更好的是，如果不需要温度值，那么可以使用单个无符号字节来引用这个颜色纹理。

Even if the temperature itself is stored, it may be needed to only a few decimal places. A floating point number has a total precision of 24 bits, a little more than 7 decimal digits. Note that 16 bits give almost 5 decimal digits of precision. The range of temperature values is likely to be small enough that the exponent part of the floating point format is unnecessary. By using the lowest value as an offset and the highest minus the lowest as a scale, the values can be evenly spread over a limited range. For example, if values range from 28.51 to 197.12, an unsigned short value would be converted to a temperature by first dividing it by 216 − 1, then multiplying the result by a scale factor of (197.12−28.51), and finally adding the offset 28.51. By storing the scale and offset factors for the data set and passing these to the vertex shader program, the data set itself can be stored in half the space. This type of transformation is called scalar quantization [1099].

即使温度本身被存储，它可能只需要几个小数位。浮点数的总精度为24位，比7位十进制数字稍多一点。请注意，16位给出了几乎5位小数的精度。温度值的范围可能足够小，以至于浮点格式的指数部分是不必要的。通过使用最低值作为偏移量，最高值减去最低值作为刻度，这些值可以在有限的范围内均匀分布。例如，如果值的范围为28.51至197.12，则无符号短整型值将被转换为温度，首先除以216 1，然后将结果乘以比例因子(197.12-28.51)，最后加上失调28.51。通过存储数据集的比例和偏移因子并将它们传递给顶点着色器程序，数据集本身可以存储在一半的空间中。这种类型的变换被称为标量量化[1099]。

Vertex position data are usually a good candidate for such a reduction. Single meshes span a small area in space, so having a scale and offset vector (or a 4 × 4 matrix) for the whole scene can save considerable space without a significant loss in fidelity. For some scenes it may be possible to generate a scale and offset for each object, so increasing precision per model. However, doing so may cause cracks to appear where separate meshes touch [1381]. Vertices originally in the same world location but in separate models may be scaled and offset to slightly different locations.When all models are relatively small compared to the scene as a whole, one solution is to use the same scale for all models and align the offsets, which can give a few more bits of precision [1010].

顶点位置数据通常是这种减少的良好候选。单个网格跨越空间中的一个小区域，因此整个场景有一个比例和偏移向量(或一个4 × 4矩阵)可以节省大量空间，而不会明显损失保真度。对于某些场景，可以为每个对象生成一个比例和偏移，从而提高每个模型的精度。但是，这样做可能会导致单独网格接触的地方出现裂缝[1381]。最初在同一世界位置但在不同模型中的顶点可能会被缩放和偏移到稍微不同的位置。当所有模型与场景整体相比相对较小时，一种解决方案是对所有模型使用相同的比例并对齐偏移，这可以提供更多的精度[1010]。

Sometimes even floating point storage for vertex data is not sufficient to avoid precision problems. A classic example is the space shuttle rendered over the earth.The shuttle model itself may be specified down to a millimeter scale, but the earth’s surface is over 100,000 meters away, giving an 8 decimal-place difference in scale.When the shuttle’s world-space position is computed relative to the earth, the vertex locations generated need higher precision. When no corrective action is taken, the shuttle will jitter around the screen when the viewer moves near it. While the shuttle example is an extreme version of this problem, massive multiplayer worlds can suffer the same effects if a single coordinate system is used throughout. Objects on the fringes will lose enough precision that the problem becomes visible—animated objects will jerk around, individual vertices snap at different times, and shadow map texels will jump with the slightest camera move. One solution is to redo the transform pipeline so that, for each origin-centered object, the world and camera translations are first concatenated together and so mostly cancel out [1379, 1381]. Another approach is to segment the world and redefine the origin to be in the center of each segment, with the challenge then being travel from one segment to another. Ohlarik [1316] and Cozzi and Ring [299] discuss these problems and solutions in depth.

有时，即使是顶点数据的浮点存储也不足以避免精度问题。一个经典的例子是在地球上空渲染的航天飞机。航天飞机模型本身可以精确到毫米的比例，但地球表面距离地球表面超过100,000米，比例相差8位小数。当计算航天飞机相对于地球的世界空间位置时，生成的顶点位置需要更高的精度。如果不采取纠正措施，当观众靠近时，梭子会在屏幕周围抖动。虽然穿梭机的例子是这个问题的一个极端版本，但是如果整个世界都使用单一的坐标系，大型多人游戏世界也会遭受同样的影响。边缘上的对象将失去足够的精度，以至于问题变得可见——动画对象将四处跳动，各个顶点在不同的时间捕捉，阴影贴图纹理元素将随着最轻微的相机移动而跳跃。一种解决方案是重做变换管道，以便对于每个以原点为中心的对象，世界和相机平移首先连接在一起，因此大部分抵消[1379，1381]。另一种方法是将世界分段，并将原点重新定义在每个分段的中心，然后挑战是从一个分段旅行到另一个分段。Ohlarik [1316]和Cozzi和Ring [299]深入讨论了这些问题和解决方案。

Other vertex data can have particular compression techniques associated with them. Texture coordinates are often limited to the range of [0.0, 1.0] and so can normally be safely reduced to unsigned shorts, with an implicit offset of 0 and scale divisor of 2(16次方) −1. There are usually pairs of values, which nicely fit into two unsigned shorts [1381], or even just 3 bytes [88], depending on precision requirements.

其他顶点数据可以具有与其相关联的特定压缩技术。纹理坐标通常限制在[0.0，1.0]的范围内，因此通常可以安全地简化为无符号短整型，隐式偏移为0，比例因子为2(16次方) −1。通常有成对的值，它们恰好适合两个无符号的短整型[1381]，甚至只有3个字节[88]，这取决于精度要求。

Unlike other coordinate sets, normals are usually normalized, so the set of all normalized normals forms a sphere. For this reason researchers have studied transforms of a sphere onto a plane for the purpose of efficiently compressing normals. Cigolle et al. [269] analyze the advantages and trade-offs of various algorithms, along with code samples. They conclude that octant and spherical projections are the most practical, minimizing error while being efficient to decode and encode. Pranckeviˇcius [1432] and Pesce [1394] discuss normal compression when generating G-buffers for deferred shading (Section 20.1).

与其他坐标集不同，法线通常是规格化的，因此所有规格化法线的集合形成一个球体。出于这个原因，研究人员已经研究了将球体变换到平面上，以便有效地压缩法线。Cigolle等人[269]分析了各种算法的优势和权衡，以及代码示例。他们得出结论，八分投影和球形投影是最实用的，在有效解码和编码的同时将误差降至最低。pranckevicius[1432]和Pesce [1394]讨论了为延迟着色生成G缓冲区时的正常压缩(第20.1节)。

Other data may have properties that can be leveraged to reduce storage. For example, the normal, tangent, and bitangent vectors are commonly used for normal mapping. When these three vectors are mutually perpendicular (not skew), and if the handedness is consistent, then only two of the vectors could be stored and the third derived by a cross product. More compact yet, a single 4-byte quaternion with a handedness bit saved along with a 7-bit w can represent the rotation matrix that the basis forms [494, 1114, 1154, 1381, 1639]. For more precision, the largest of the four quaternion values can be left out and the other three stored at 10 bits each. The remaining 2 bits identify which of the four values is not stored. Since the squares of the quaternion sum to 1, we can then derive the fourth value from the other three [498].Doghramachi et al. [363] use a tangent/bitangent/normal scheme storing the axis and angle. It is also 4 bytes and, compared to quaternion storage, takes about half the shader instructions to decode.

其他数据可能具有可用于减少存储的属性。例如，法线、切线和双切线向量通常用于法线贴图。当这三个向量相互垂直(非偏斜)时，并且如果旋向一致，那么只有两个向量可以被存储，第三个向量由叉积导出。更紧凑的是，一个4字节四元数(带有一个保存的旋向位和一个7位w)可以表示基形成的旋转矩阵[494，1114，1154，1381，1639]。为了更精确，可以省去四个四元数值中的最大值，其他三个以10位存储。剩余的2位标识四个值中的哪一个没有被存储。由于四元数的平方和为1，因此我们可以从其他三个值中得出第四个值[498]。Doghramachi等人[363]使用切线/双切线/法线方案存储轴和角度。它也是4个字节，并且与四元数存储相比，需要大约一半的着色器指令来解码。

See Figure 16.23 for a summary of some fixed-rate compression methods.

图16.23总结了一些固定速率压缩方法。

Real-Time Rendering——16.6 Compression and Precision压缩和精度

Further Reading and Resources

猜你喜欢