最近看了Rufail Negatu Mekuria关于3D远程沉浸式系统的博士论文。里面关于网格压缩（mesh compression）中差分编码、几何八叉树编码与点云压缩（point cloud compression）中帧间编码、帧内编码的部分比较好理解，但是对于网格压缩中连接性编码（connectivity coding）关于pattern run的部分只理解的思想，具体算法还理解的不是很透彻。下面附上阅读笔记

Part I

Part II

Reading notes of 'Network Streaming and Compression

for Mixed Reality Tele-Immersion'

The 3D immersive scene in this thesis reminds me of Ready One Player, an America movie which is very popular in China. It gives me the thought that the scene in this movie may come true in the near future! Though the coarse avatar rendered in the virtual world which represents a natural user is incompatible with the surroundings, I still believe that with the development of the mixed reality, people can talk and laugh in the virtual world just like they are meeting each other face to face.

The thesis develops a complete 3D tele-immersive streaming platform and focus on dealing with the main research questions. Standardized Compression Technologies for media data types are becoming available on the market but for live captured 3D point cloud and mesh data an equivalent widely adopted codec does not exist yet. Therefore, chapter 3-6 mainly discuss the compression of the 3D mesh and point cloud and present some codecs which are suitable for real-time end-to-end tele-immersive communication. The 3D Tele-immersive media pipeline is shown below (Figure 1).

Figure 1 3D Tele-immersive media pipeline

To achieve real-time compression and transmission of highly realistic reconstructed 3D humans based on meshes, a compression method of triangle meshes based on block based quantization (chapter 3) and pattern based connectivity coding (chapter 4) is developed. The methods will be described in detail as follow.

Chapter 3 3D Tele-immersion with Live Captured Geometry using Block Based Mesh Compression

Chapter 3 discusses a compression method with live captured geometry using block based mesh. First, the main part of a frame can be reconstructed with triangle meshes using Microsoft Kinect. Then a 3D compression strategy is brought up. Figure 2 presents an overview of the developed compression algorithm.

Figure 2 Overview scheme of 3D designed compression

The mesh can be divided into two parts: Geometry and Connectivity. They can be coded separately.

Geometry

The main idea of coding the geometry is to use the differential coding strategy. Subsequent coordinates in the list of vertices are co-located, so it can be exploited with differential coding and local quantization. In this way, the relative coordinate between two connected points rather than their absolute coordinates is coded, which results in a reduction in coding bits.

Connectivity

As for the compression of connectivity, specific regularities in the connectivity information can be exploited. Figure 3 shows the process of connectivity coding. First, the entire connectivity information is searched for repeated regularities which are counted and stored in the data-structure pattern run (Table 1). Next, all indices are iterated again. The difference coding scheme is used to store the relative positon of adjacent triangle but when an index is the start of a pattern run, the pattern run (indicated by the start field) is stored instead. It can also obtain a large reduction in connectivity data size in this way.

Figure 3 Pattern Based Connectivity Coding

Table 1 Structure Pattern Run

In this chapter, a transmission scheme based on LT code is also brought up which is suitable for the 3D tele-immersive system. It protects each mesh frame against losses by allowing any given number of extra packets to be generated. And the experiments which are presented to test this scheme show that the scheme heavily outperforms other methods on the speed requirement.

Chapter 4 3D Tele-immersion with Connectivity Driven 3D Mesh Compression with Late Differential Quantization

Chapter 4 develops a codec with a higher precision of the coordinates and attributes to enable a higher quality transmission compared to chapter 3. Figure 4 outlines the compression system. It introduces a fast connectivity traversal method combined with connectivity-driven (offline) optimized DPCM coding and layered quantization.

Figure 4 Schematics of proposed dynamic mesh codec

We can see the difference between the scheme in this chapter and that in chapter 3 is the layered quantization and the appearance quantization. The dynamic range of the vertices that are quantized in a 3D space is large. This makes details in the surface vulnerable to quantization distortion that increases with coarse quantization parameters. So in this chapter, the author deals with this problem by proposing DPCM in which a layered structure is brought up with a variable number of bits. In this way, if the different between vertices that are connected to each other becomes large, more bits can be used for the quantization and large errors in geometry are avoided. The pattern based connectivity coding approach used in this chapter is similar with that in chapter 3 so it will not be illustrated any more. The author deploys the same system of late differential quantization for the appearance data, however the layers are assigned differently. Some experiments to test this scheme show that the codec is more generic and can work well for different setups. This work enabled higher resolution and precision compared to the system developed in the previous chapter.

Chapter 5 Highly Adaptive Geometry Driven 3D Mesh Compression

The approach for geometry compression and transmission presented in the previous chapters enables high quality real-time 3D end-to-end communications. However, as the approach is based on near lossless techniques, the compression ratios are limited and bit-rate requirements are still high. This chapter experiments with lossy coding of live reconstructed geometry using mesh simplification to satisfy different demand of users. For example, when the participant is at a distance in a room, it can be rendered in a low quality to guarantee the high speed of the transmission.

Figure 5 Geometry Driven Mesh Codec

Figure 5 illustrates a geometry driven mesh coding scheme for the immersive virtual room designed in this chapter based on a lossy coding approach. This approach is like the approaches brought up in the previous chapter which can be divided into two parts: geometry coding and connectivity coding.

Geometry

The geometry information is coded using an octree scheme (Figure 6). All abbreviations used to describe the algorithm are given in Table 2.

Figure 6 Octree subdivision in space

Symbol	Description
Ior	Vertices indices for the Coriginal
Isimp	Vertices indices for the Csimplified
Coriginal	Original connectivity
Csimplified	Simplified connectivity
Tun_ordered	A collection of triples of unordered indices
Tordered	A collection of triples of ordered indices
Voctree	Octree voxel grid
TVQ	Obtained from Tordered
z	3D vectors representing a shift in the octree voxel grid

Table 2 Symbols used for Geometry Driven Mesh Compression

The method traverses the octree and build the following mapping relationship to build the connection between the vertex indices and the 3D voxel indices in octree grid. During this traverse the author optionally computes an enhancement layer that refines the position of the voxel center based on the positions of enclosed vertices based on a weighted average. This layer can be used to increase the decoded mesh quality without further octree subdivision. Then the 3D voxel indices can be coded using the same method in the previous chapter.

Connectivity

In addition, the connectivity relationship can be simplified.

Next, the set Csimplified is ordered in ascending order of i1 and all i1 are coded

via a run length coding scheme (RL coding), that codes the number of repetitions and increments in i1. Next, i2, i3 are represented by a 3D offset in the octree voxel grid away from connected voxels. Hence the elements of the set TVQ satisfy relationship below.

i1 was already coded by the RL coding of the first index and therefore, only z2 and z3 need to be encoded. By structuring the connectivity information in Tordered achieved a structure where z2 and z3 can be coded very efficiently via vector quantization.

Decoding

For the decoder, inverse options are needed.

Where i1 was already recovered from the run length encoded data and therefore, by

recovering i2 followed by i3 for each tvq , Csimplified is recovered. By having the geometry and connectivity, the mesh is fully decoded.

Therefore, a complete lossy geometry compression scheme that allows very fine grained control of the output resolution (number of output points) by changing the octree depth is presented. In addition, the codec enables real-time encoding and decoding, useful for real-time end-to-end communications.

At last, the evaluation shows that we can use the geometry driven codec when a low bit-rate and low quality is needed, and gradually increase the quality. For high quality representations we can then use the connectivity driven coder for dense geometry or MPEG-4 TFAN in case meshes are sparse.

Reading notes of Rufail's dissertation（Part I）