[AI 技术文章之其三] 基于神经网络的图像压缩技术

前言

  • 这两个月真是突如其来的清闲……偶尔分配来个 Bug,但经常就很快搞定了。跟组长讨论了一下代码结构优化方面的问题,把之前加入的一套业务逻辑做了整体优化,然后又陷入 “闲” 者模式。
  • 剩下的大多时间都是在学习学习,熟悉熟悉项目源码。现在主要在搞 MTK Camera Hal 层的东西, 真是想吐槽一下,Mtk 的代码有很多冗余的部分,比如各种 CamAdapter,明明代码一样一样的,非要复制好几份出来,然后只是在 creatInstance 的时候区分一下……就不舍得提取一些状态之类的东西出来优化一下……
  • 然后编译的时间经常要很久,sourceInsight 又时不时要同步信息,各个基线的项目每隔一两天也得更新一次……趁着这些时间,我就悄悄地继续投入到翻译社第二期活动里去了…..
  • 值得一提的是,上期活动贡献排名前十,腾讯给我发来了一个狗年的公仔 “哈士企”……
  • 第二期活动是 “探索AI技术,科技引领未来”,不得不说这正对我胃口,于是多领了几篇……
  • 虽然总共翻译了十篇文章,但是由于有些文章内容其实比较一般,再一个就是像自然语言处理这方面的内容其实以前并没有经常看,所以对里面很多专业的描述表述不清楚,所以很多篇文章都只是通过,没被专栏采纳……
  • 下面继续把被专栏采纳的几篇以中英对照的形式放出来:

版权相关

翻译人:StoneDemo,该成员来自云+社区翻译社
原文链接:Image Compression with Neural Networks
原文作者:Posted by Nick Johnston and David Minnen, Software Engineers


Image Compression with Neural Networks

题目:基于神经网络的图像压缩技术

Data compression is used nearly everywhere on the internet - the videos you watch online, the images you share, the music you listen to, even the blog you’re reading right now. Compression techniques make sharing the content you want quick and efficient. Without data compression, the time and bandwidth costs for getting the information you need, when you need it, would be exorbitant!

在互联网之中,数据压缩技术可以说无处不在 —— 您在线观看的视频,分享的图片,听到的音乐,甚至是您正在阅读的这篇博客。压缩技术使得您可以快速且高效地分享内容。如果没有数据压缩,我们在获取所需的信息时,时间与带宽的开销会高得难以接受!

In “Full Resolution Image Compression with Recurrent Neural Networks”, we expand on our previous research on data compression using neural networks, exploring whether machine learning can provide better results for image compression like it has for image recognition and text summarization. Furthermore, we are releasing our compression model via TensorFlow so you can experiment with compressing your own images with our network.

在 “基于递归神经网络的全分辨率图像压缩 ” 一文中,我们对以往使用神经网络进行数据压缩的研究进行了拓展,以探索机器学习是否能像在图像识别文本摘要领域中的表现一样,提供更好的图像压缩效果。此外,我们也正通过 TensorFlow 来发布我们的压缩模​​型,以便您可以尝试使用我们的网络来压缩您自己的图像。

We introduce an architecture that uses a new variant of the Gated Recurrent Unit (a type of RNN that allows units to save activations and process sequences) called Residual Gated Recurrent Unit (Residual GRU). Our Residual GRU combines existing GRUs with the residual connections introduced in “Deep Residual Learning for Image Recognition” to achieve significant image quality gains for a given compression rate. Instead of using a DCT to generate a new bit representation like many compression schemes in use today, we train two sets of neural networks - one to create the codes from the image (encoder) and another to create the image from the codes (decoder).

当前我们提出了一种使用残差门控循环单元(RGRU,Residual GRU)的架构,这种单元是门控循环单元(GRU,Gated Recurrent Unit,一种允许单元保存激活和处理序列的 RNN 类型)的一个新型变体。我们的 RGRU 是将原本的 GRU 与文章 “深度残差学习图像识别 ” 中引入的残差连接相结合,以实现在给定的压缩率下获得更显着的图像质量增益。我们训练了两组神经网络 —— 一组用于根据图像进行编码(即作为编码器),另一组则是从编码中解析出图像(即解码器)。而这两组神经网络则代替了目前图像压缩技术中主要使用的,采用 DCT(Discrete Cosine Transform,离散余弦变换) 来生成新的比特表示的压缩方案。

Our system works by iteratively refining a reconstruction of the original image, with both the encoder and decoder using Residual GRU layers so that additional information can pass from one iteration to the next. Each iteration adds more bits to the encoding, which allows for a higher quality reconstruction. Conceptually, the network operates as follows:

  1. The initial residual, R[0], corresponds to the original image I: R[0] = I.
  2. Set i=1 for to the first iteration.
  3. Iteration[i] takes R[i-1] as input and runs the encoder and binarizer to compress the image into B[i].
  4. Iteration[i] runs the decoder on B[i] to generate a reconstructed image P[i].
  5. The residual for Iteration[i] is calculated: R[i] = I - P[i].
  6. Set i=i+1 and go to Step 3 (up to the desired number of iterations).

我们的系统通过迭代的方式提炼原始图像的重构,同时编码器和解码器都使用了 RGRU 层,从而使得附加信息在多次迭代中传递下去。每次迭代都会在编码中增加更多的比特位数,从而实现更高质量的重构。从概念上来说,该网络的工作流程如下:

  1. 初始残差 R[0] 对应于原始图像 I,即 R[0] = I。
  2. 为第一次迭代设置 i = 1。
  3. 第 i 次迭代以 R[i-1] 作为输入,并运行编码器和二进制化器将图像压缩成 B[i]。
  4. 第 i 次迭代运行 B[i] 上的解码器以生成重建的图像 P[i]。
  5. 计算第 i 次迭代的残差:R[i] = I - P[i]。
  6. 设置 i = i + 1 并转到步骤 3(直到达到了所需的迭代次数为止)。

The residual image represents how different the current version of the compressed image is from the original. This image is then given as input to the network with the goal of removing the compression errors from the next version of the compressed image. The compressed image is now represented by the concatenation of B[1] through B[N]. For larger values of N, the decoder gets more information on how to reduce the errors and generate a higher quality reconstruction of the original image.

残差图像展示了当前版本的压缩图像与原始图像的差异。而该图像随后则作为输入提供给神经网络,其目的是剔除下一版本的压缩图像中的压缩错误。现在压缩的图像则是由 B[1] 至 B[N] 的连接表示。N 值越大,解码器就能获得更多有助于减少错误,同时又可以生成更高质量的原始图像的重构的信息。

To understand how this works, consider the following example of the first two iterations of the image compression network, shown in the figures below. We start with an image of a lighthouse. On the first pass through the network, the original image is given as an input (R[0] = I). P[1] is the reconstructed image. The difference between the original image and encoded image is the residual, R[1], which represents the error in the compression.

为了理解该算法是如何运作的,请考虑如下图所示的,图像压缩网络前两次迭代的示例。我们以一座灯塔的图像作为原始数据。当它第一次通过网络时,原始图像作为输入进入(R[0] = I)。P[1] 是重建的图像。原始图像和编码图像之间的差异即是残差 R[1],它表示了压缩中出现的误差。

这里写图片描述
(左图:原始图像,I = R[0]。中图:重建的图像,P[1]。右:表示由压缩引入的错误的残差 R[1]。)

On the second pass through the network, R[1] is given as the network’s input (see figure below). A higher quality image P[2] is then created. So how does the system recreate such a good image (P[2], center panel below) from the residual R[1]? Because the model uses recurrent nodes with memory, the network saves information from each iteration that it can use in the next one. It learned something about the original image in Iteration[1] that is used along with R[1] to generate a better P[2] from B[2]. Lastly, a new residual, R[2] (right), is generated by subtracting P[2] from the original image. This time the residual is smaller since there are fewer differences between the reconstructed image, and what we started with.

在第二次通过网络时,R[1] 则作为网络的输入(如下图)。然后更高质量的图像 P[2] 就生成了。那么问题来了,系统是如何根据输入的残差 R[1] 重新创建出这样一个更好的图像(P[2],下图中部)的呢?这是由于模型使用了带有记忆功能的循环节点,因此网络会保存每次迭代中可用于下一次迭代的信息。它在第一次迭代中学习到了关于原始图像的一些东西,这些东西同 R[1] 一起用于从 B[2] 中生成更好的 P[2]。最后,通过从原始图像中减去 P[2] 就产生了新的残差 R[2](右)。由于本次我们重建得到的图像和原始图像之间的差异较小,因此残差也较小。

这里写图片描述
(第二遍通过网络。左图:R[1] 作为输入。中图:更高质量的重建,P[2]。右图:通过从原始图像中减去 P[2] 生成更小的残差 R[2]。)

At each further iteration, the network gains more information about the errors introduced by compression (which is captured by the residual image). If it can use that information to predict the residuals even a little bit, the result is a better reconstruction. Our models are able to make use of the extra bits up to a point. We see diminishing returns, and at some point the representational power of the network is exhausted.

在之后的每次迭代中,网络将获得更多由于压缩而引入的误差的信息(通过残差图像捕获到的信息)。如果我们可以用这些信息来预测残差,那么就能得到更优重建结果。我们的模型能够在一定程度上利用多余的部分信息。我们可以看到收益是递减的,并且在某个时候,网络的所表现出来的能力就会被用尽。

To demonstrate file size and quality differences, we can take a photo of Vash, a Japanese Chin, and generate two compressed images, one JPEG and one Residual GRU. Both images target a perceptual similarity of 0.9 MS-SSIM, a perceptual quality metric that reaches 1.0 for identical images. The image generated by our learned model results in an file 25% smaller than JPEG.

为了演示文件大小和质量之间的差异,我们可以拍摄一张日本狆 Vash 的照片,并生成两种压缩图像,一个 JPEG 和一个 RGRU。两幅图像都以 0.9 MS-SSIM 的感知相似度为目标,如果是两张完全相同的图像,它们之间的感知相似度就是 1.0。我们的学习模型在同样质量的情况下,生成了比 JPEG 小 25% 的最终图像。

这里写图片描述
(左图:1.0 MS-SSIM 的原始图像(1419 KB PNG)。中图:0.9 MS-SSIM 的 JPEG(33 KB)。右图:0.9 MS-SSIM 的 RGRU(24 KB)。相比之下,图像数据量要小25%)

Taking a look around his nose and mouth, we see that our method doesn’t have the magenta blocks and noise in the middle of the image as seen in JPEG. This is due to the blocking artifacts produced by JPEG, whereas our compression network works on the entire image at once. However, there’s a tradeoff – in our model the details of the whiskers and texture are lost, but the system shows great promise in reducing artifacts.

观察它的鼻子和嘴巴,我们可以看到,我们的方法没有造成在 JPEG 图像中看到的中间部分的洋红色块和噪音。这是由于 JPEG 压缩的方块效应而产生的,而此处我们的压缩网络对整个图像同时处理。然而,这是经过了权衡的 —— 在我们的模型中,晶须和纹理的细节丢失了,但是它在减少方块效应方面展现出了强大能力。

这里写图片描述
(左图:原始图像。中图: JPEG。右图:RGRU。)

While today’s commonly used codecs perform well, our work shows that using neural networks to compress images results in a compression scheme with higher quality and smaller file sizes. To learn more about the details of our research and a comparison of other recurrent architectures, check out our paper. Our future work will focus on even better compression quality and faster models, so stay tuned!

虽然今天我们常用的编解码器依旧表现良好,但我们的工作已经表明,使用神经网络来压缩图像可以生成质量更高且数据量更小的压缩方案。如果想了解更多关于我们的研究的细节,以及与其他循环架构的比较,请查看我们的论文。在未来的研究中,我们将着重于获得更好的压缩质量以及设计更高效的模型,敬请期待!

猜你喜欢

转载自blog.csdn.net/qq_16775897/article/details/79327090