Game Optimization 3 - Advantages of Batch Processing

It describes the process of combining a large number of arbitrary data blocks together and processing them as a single large data block.

In some cases, batched objects refer to large collections of meshes, vertices, edges, UV coordinates, and other different data types used to describe 3D objects. However, the term can also simply refer to the behavior of batch processing audio files, sprites, texture files, and other large data sets.

3.1 Draw Call

Draw Call is just a request sent from the CPU to the GPU to draw an object.
Draw Call is the common industry term for this process, but it is also sometimes called SetPass Call in Unity because some of the underlying methods are also named SetPass Call. Draw Call can be understood as a configuration option before initializing the current rendering process. The remainder of this book will refer to them collectively as Draw Call. Before requesting a Draw Call, some work needs to be done. First, mesh and texture data must be pushed from CPU memory (RAM) to GPU memory (VRAM). Changing render state is a time-consuming process. For example, if you set the render state to use a blue texture file and then ask it to render a huge mesh, the rendering will be very fast and the entire mesh will appear blue. Then, 9 more completely different meshes can be rendered, and they all appear blue because the texture used was not changed. However, if you want to render 10 meshes with 10 different textures, it will take longer. This is because the render state needs to be prepared with a new texture before a Draw Call is sent for each mesh. The fewer times a request is made to change rendering state, the faster the Graphics API can handle the request. Once the render state is configured, the CPU must decide which mesh to draw, what textures and shaders to use, and object-based position, rotation, and scale (these are all represented in a 4×4 matrix called the transformation, which is The Transform component (hence its name) determines where to draw the object and then sends instructions to the GPU to draw it. To keep communication between the CPU and GPU active, new instructions are pushed into a queue called the Command Buffer. This queue contains instructions created by the CPU and fetched from the GPU after each execution of the previous command.

The trick to batching to improve the performance of this process is that a new Draw Call does not necessarily mean that a new rendering state must be configured. If two objects share the exact same render state information, the GPU can immediately start rendering the new object because the same render state is maintained after the last object has finished rendering. This eliminates the time wasted by synchronizing render states. It also reduces the number of instructions that need to be pushed into the Command Buffer, reducing the workload on the CPU and GPU.

3.2 Materials and Shaders

So, if you want to minimize the frequency of render state changes, you can reduce the number of materials used in the scene. This will improve both performance at the same time; the CPU will spend less time generating instructions per frame and transmitting them to the GPU; and the GPU will not need to stop frequently to resynchronize state changes.

The remaining 8 batches are used to draw 8 objects. For each object, the Draw Call prepares the rendering pipeline using the material's properties and requests the GPU to render the given mesh with the object's current transformation settings. Provide each object with a different texture file for rendering to ensure the material is unique. Therefore, each grid requires a different rendering state, so these 8 grids require different Draw Calls.

3.3 Frame Debugger

To open the Frame Debugger, select Window | Frame Debugger in the main window or click the Frame Debugger button in the Breakdown View Options in the Rendering area of ​​the Profiler. Both operations can open the Frame Debug window. Click the Enable button in the Frame Debug window to observe how the scene is constructed, executing one Draw Call each time.

3.4 Dynamic batch processing

Requirements for performing dynamic batching of a given mesh: All mesh instances must use the same material reference. Only the ParticleSystem and MeshRenderer components perform dynamic batching. SkinnedMeshRenderer components (used for character animation) and all other renderable component types cannot be batched. Each mesh has at most 300 vertices. The number of vertex attributes used by a shader cannot be greater than 900. All mesh instances use either proportional scaling or non-proportional scaling, but not a mixture of the two. Mesh instances should reference the same lighting texture file. A material's shader cannot rely on multiple passes. Mesh instances cannot accept live projections.

3.4.1 Vertex attributes

A vertex attribute is simply a piece of information per vertex in the mesh file, and each piece is usually represented as a set of floating point numbers. It includes, but is not limited to, vertex position (relative to the root of the mesh), normal vector (a vector pointing outward from the surface of the object, often used in lighting calculations), one or more sets of texture UV coordinates (used to define a or how multiple textures wrap the mesh), possibly even including per-vertex color information. Only meshes with less than 900 total vertex attributes used by the shader will be dynamically batched.

Find the MeshFilter component in the Project window and view the verts value in the Preview sub-area of ​​the Inspector window. The 900 attribute budget is consumed more and more, thereby reducing the number of vertices the mesh is allowed to have that are no longer available for dynamic batching. For example, a simple diffuse shader can only use 3 properties per vertex: position, normal, and a set of UV coordinates. Therefore, dynamic batching can use this shader to support meshes with a total of 300 vertices. However, in more complex shaders, 5 attributes are required per vertex and can only support dynamic batching of meshes with no more than 180 vertices. Also, note that even with less than 3 vertex attributes per vertex in the shader, dynamic batching still only supports meshes of up to 300 vertices, so only relatively simple objects are suitable for dynamic batching. These limitations are why after the scene turns on dynamic batch processing, even though all objects share the same material reference, only 3 Draw Calls are saved. The cube mesh automatically generated by Unity contains only 8 vertices, each with position, normal and UV data, for a total of 24 attributes, which is well below the upper limit of 300 vertices and 900 vertex attributes. However, the automatically generated sphere contains 515 vertices and therefore has a total of 1545 vertex attributes, which clearly exceeds the limits of 300 vertices and 900 vertex attributes, so it cannot be batched dynamically. If you click on a Draw Call item in the Frame Debugger, a section labeled "Why this draw call can't be batched with the previous one (Why this Draw Call can't be batched with the previous one)" will be displayed.

3.4.2 Grid scaling

Objects should use uniform scaling, or each object should have a different non-proportional scaling, to be included in dynamic batch processing. Proportional scaling means that the three components (x, y, z) of the scaling vector are all the same (but different grids do not need to meet this condition), non-proportional scaling means that at least one of these values ​​​​is equal to the other are different, and the objects belonging to these two groups will be put into two different batches.

Next, give an example. Suppose there are the following 4 objects: A is scaled by (1, 1, 1), B is scaled by (2, 1, 1), C is scaled by (2, 2, 1), and D is scaled by (2, 2, 2) . Objects A and D are proportionally scaled because the values ​​of the three components are the same. Although the scaling ratios of the two grids A and D are different, they are still scaled proportionally and will be placed in the same dynamic batch.

3.4.3 Summary of dynamic batch processing

If the only thing preventing two objects from being dynamically batched is that they use different textures, you should take the time and effort to merge the textures (often called atlases) and regenerate the mesh UVs for dynamic batching. This may come at the expense of texture quality, or the texture files may become larger (a drawback to be aware of, discussed in detail in Chapter 6's in-depth discussion of GPU memory bandwidth), but it's worth it. The only time dynamic batching could hurt performance is if you set up a scene with hundreds of simple objects and only a few objects in each batch. In this case, the overhead cost of detecting and generating so many mini-batch groups may be more than the time saved by performing a separate Draw Call for each mesh. Even so, this generally doesn't happen.

When these unexpected events occurred, no warning was issued, only that the number of Draw Calls increased after the modification was made, and the performance further declined. In order to keep the number of dynamic batches in the scene at an appropriate level, you need to continuously check the number of Draw Calls and observe the Frame Debugger data to ensure that the latest modifications do not accidentally disqualify the object from dynamic batches. However, as always, only be concerned about Draw Call performance if this proves to be a performance bottleneck.

 

3.5 Static batch processing

It only processes objects marked as Static, hence the name static batching.

3.5.1 Static tag

Static batching can only be applied to objects with the static flag turned on, specifically the Batching Static sub-tags (these sub-tags are called StaticEditorFlags). Click the small lower triangle next to the Static option of GameObject, and a StaticEditorFlags drop-down box will appear, which can modify the behavior of the object for different Static processing processes. An obvious side effect of this flag is that the object's transform cannot be modified. Therefore, any object you want to use static batching cannot be moved, rotated, and scaled in any way.

3.5.2 Memory requirements

When static batch processing is working, all visible mesh data marked as Static is copied to a larger mesh data buffer and passed to the pipeline rendering through a Draw Call, while ignoring the original mesh. When static batch processing is working, all visible mesh data marked as Static is copied to a larger mesh data buffer and passed to the pipeline rendering through a Draw Call, while ignoring the original mesh.

These static batched copies consume additional memory equal to the number of grids multiplied by the size of the original grids. Typically, rendering one, ten, or a million of the same object consumes the same amount of memory because they all reference the same mesh data.

3.5.3 Material reference

3.5.4 Warnings for static batch processing

It only processes objects marked as Static, hence the name static batching. Static batching systems have their own requirements: As the name implies, the grid must be marked as Static (Batching Static, specifically)
 

As mentioned before, sharing material references is a way to reduce render state changes, so this requirement is obvious. Draw Calls are reduced, but they cannot be seen directly in the Stats window and can only be seen at runtime. Objects marked as Batching Static are introduced into the scene at runtime and cannot be automatically included in static batch processing.

 

Only a deep understanding of these batch processing systems and how they work can help us determine when and where this feature can be used.

Guess you like

Origin blog.csdn.net/qq_35647121/article/details/115451869