DRCNN: Beyond Gaussian Denoising: Residual Learning for Deep CNN Image Denoising

Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising

Abstract—Image denoising methods based on discriminative model learning have attracted much attention due to their good denoising performance. In this paper, we take a step forward by studying the construction of feed-forward denoising convolutional neural networks (DnCNNs), incorporating very deep architectures, learning algorithms, and regularization methods for image denoising. Specifically, residual learning and batch normalization are utilized to speed up the training process and improve denoising performance. Different from existing discriminative denoising models that usually train specific additive Gaussian white noise models at a certain noise level, our DnCNN model is able to handle Gaussian denoising with unknown noise levels (i.e., blind Gaussian denoising). DnCNN employs a residual learning strategy to implicitly remove potential clean images in the hidden layer. This property motivates us to train a DnCNN model to handle several general image denoising tasks, such as Gaussian denoising, single image super-resolution and JPEG image denoising. Our extensive experiments demonstrate that our DnCNN model can not only exhibit high effectiveness in some general image denoising tasks, but also be efficiently implemented using GPU computing.

I. INTRODUCTION

Image denoising is a classic and active topic in low-level vision because it is an indispensable step in many practical applications. The goal of image denoising is to recover a clean image x from noisy observations y, which follows an image degradation model y = x + v. A common assumption is that v is additive white Gaussian noise (AWGN) with standard deviation σ. From a Bayesian perspective, image prior modeling will play a central role in image denoising when the likelihood is known. Over the past few decades, various models have been used to model image priors, including non-local self-similarity (NSS) models [1]-[5], sparse models [6]-[8], gradient models [9]-[11] and Markov Random Field (MRF) models [12]-[14]. In particular, NSS models are popular among state-of-the-art methods such as BM3D [2], LSSC [4], NCSR [7] and WNNM [15].

Despite the high denoising quality, most denoising methods usually suffer from two main drawbacks. First, these methods usually involve a complex optimization problem in the testing phase, making the denoising process time-consuming [7], [16]. Therefore, it is difficult for most methods to achieve high performance without sacrificing computational efficiency. Second, the models are usually non-convex and involve some artificially chosen parameters, which provide some leeway to improve the denoising performance...

In this paper, instead of learning a discriminative model with an explicit image prior, we treat image denoising as a simple discriminative learning problem, that is, by feed-forward convolutional neural network (CNN), the noise from the noisy image separated out. The reason for using CNN is threefold. First, CNNs with very deep architectures [26] can effectively increase the capacity and flexibility of exploiting image features. Second, regularization and learning methods for training CNNs have made great progress, including rectified linear unit (ReLU) [27], batch normalization [28] and residual learning [29]. Adopting these methods in CNN can speed up the training process and improve the denoising performance. Third, CNNs are well suited for parallel computing on modern powerful GPUs, which can be leveraged to improve runtime performance.

We refer to the proposed denoising convolutional neural network as DnCNN. Instead of directly outputting the denoised image @@x, DnCNN is used to predict the residual image @@v, which is the difference between the noisy observed image and the underlying clean image. That is, the proposed DnCNN implicitly removes latent clean images through operations in hidden layers. Batch normalization techniques are further introduced to stabilize and improve the training performance of DnCNN. The results show that residual learning and batch normalization can promote each other, and the fusion of the two can effectively speed up the training and improve the denoising performance.

The purpose of this paper is to design a more efficient Gaussian denoiser. We observe that when v is the difference between the ground truth of the high-resolution image and the bicubic upsampling of the low-resolution image, the Gaussian denoised image The degradation model can be transformed into a single image super-resolution (SISR) problem; similarly, the JPEG image deblocking problem can be modeled with the same image degradation model, taking v as the difference between the original image and the compressed image. In this sense, SISR and JPEG image denoising can be seen as two special cases of the "general" image denoising problem, although the denoising noise v in SISR and JPEG is quite different from AWGN. One naturally asks, is it possible to train a single CNN model to handle such a general image denoising problem? By analyzing the connection between DnCNN and TNRD [19], we propose to extend DnCNN to handle several common image denoising problems tasks, including Gaussian denoising, SISR, and JPEG image deblocking.

Extensive experiments show that our DnCNN trained under a certain noise level can produce better Gaussian denoising results than state-of-the-art methods such as BM3D [2], WNNM [15] and TNRD [19]. For Gaussian denoising with unknown noise level (i.e. Gaussian blind denoising), single-model DnCNN can still outperform BM3D [2] and TNRD [19] trained for a specific noise level. Extending DnCNN to some general image denoising tasks can also get good results. Furthermore, we show the effectiveness of training only one DnCNN model on three general image denoising tasks, namely blind Gaussian denoising, SISR with multiple scaling factors, and JPEG deblocking with different quality factors

The research results of this paper are as follows: 1) An end-to-end trainable deep CNN denoising algorithm is proposed. Compared to existing neural network-based methods that directly estimate latent clean images, the network employs a residual learning strategy to remove latent clean images from noisy observations. 2) We find that residual learning and batch normalization are of great benefit to CNN learning, as they not only speed up training, but also improve denoising performance. For Gaussian denoising with a certain noise level, DnCNN outperforms the state-of-the-art methods in both quantitative metrics and visual quality. 3) Our DnCNN can be easily extended to handle general image denoising tasks. We can train a single DnCNN model for Gaussian blind denoising and achieve better performance than competing methods trained for a specific noise level. Furthermore, it is expected to solve three common image denoising tasks, namely blind Gaussian denoising, SISR and JPEG denoising, with only one DnCNN model.

......

III. THE PROPOSED DENOISING CNN MODEL

In this section, we present the proposed denoising CNN model, namely DnCNN, and extend it to handle some general image denoising tasks. In general, training a deep CNN model, for a specific task, usually involves two steps: (i) architecture design and (ii) learning the model from training data. In terms of network architecture design, we modify the VGG network [26] to make it suitable for image denoising, and set the network depth according to the effective patch size used in the state-of-the-art denoising methods. For model learning, we adopt the residual learning formulation and combine it with batch normalization to achieve fast training and improved denoising performance. Finally, we discuss the connection between DnCNN and TNRD [19] and extend DnCNN to some general image denoising tasks.

A. Network Depth

Following the principle in [26], we set the convolution filter size to 3 × 3, but remove all pooling layers. Therefore, the receptive field of DnCNN with depth d should be (2d+1)×(2d+1). Increasing the size of the receptive field can exploit contextual information within a larger image area. To better trade off performance and efficiency, an important issue in architecture design is to set an appropriate depth for DnCNN.

It is pointed out that the receptive field size of the denoising neural network is related to the effective patch size of the denoising methods [30], [31]. Furthermore, high noise levels usually require larger effective patch sizes to capture more contextual information for recovery [41]. Therefore, by determining the noise level σ = 25, we analyze the effective patch sizes of several major denoising methods to guide the depth design of DnCNN. In BM3D [2], two non-local similar patches are searched adaptively in a local widow with a size of 25 × 25, and finally the effective patch size is 49 × 49. Similar to BM3D, WNNM [15] uses a larger search window and performs non-local search iteratively, resulting in a rather large effective patch size (361 × 361). MLP [31] first uses a patch size of 39 × 39 to generate a prediction patch, and then averages the output patches using a filter of size 9 × 9, which has an effective patch size of 47 × 47. The five-stage CSF [17] and TNRD [19] involve a total of 10 convolutional layers with a filter size of 7×7 and an effective patch size of 61×61.

Table 1 summarizes the effective patch size adopted by different methods at noise level σ = 25. It can be seen that EPLL [40] uses the smallest effective patch size, which is 36×36. It will be interesting to verify whether DnCNN with receptive field size similar to EPLL can compete with leading denoising methods. Therefore, for Gaussian denoising at a certain noise level, we set the receptive field size of DnCNN to be 35 × 35, corresponding to a depth of 17. For other general image denoising tasks, we adopt a larger receptive field and set the depth to 20.

B. Network Architecture

The input of DnCNN is a noisy observation y = x + v. Discriminative denoising models such as MLP [31], CSF [17] aim to learn a mapping function F(y) = x to predict potentially clean images. For DnCNN, we adopt the residual learning formula to train the residual map R(y) ≈ v, resulting in x = y−R(y). Formally, is the average mean squared error between the desired residual image and the noisy input estimated residual image

 

1) Depth structure: Given a DnCNN with depth D, there are three types of layers, as shown in Figure 1, with three different colors. (i) Conv+ReLU: The first layer uses 64 filters of size 3 × 3 × c to generate 64 feature maps, and then uses rectified linear unit (ReLU, max(0, )) for nonlinear processing. Here c represents the number of image channels, i.e. c = 1 for grayscale images and c = 3 for color images. (ii) Conv+BN+ReLU: For layer 2 ~ (D−1), 64 filters of size 3×3×64 are used, and batch normalization [28] is added between convolution and ReLU. (iii) Conv: The last layer reconstructs the output using a c filter of size 3 × 3 × 64.

In summary, our DnCNN model has two main features: one is to learn R(y) using the residual learning formula, and the other is to add batch normalization to speed up the training and improve the denoising performance at the same time. DnCNN combines convolution with ReLU to gradually separate image structures from noisy observations through hidden layers. This mechanism is similar to the iterative noise removal strategy adopted in methods such as EPLL and WNNM, but our DnCNN is trained in an end-to-end manner. Later we will discuss more about the principle of combining residual learning and batch normalization

2) Reduce boundary artifacts: In many low-level vision applications, it is usually required that the size of the output image is consistent with the size of the input image. This can cause boundary artifacts. In MLP [31], the preprocessing stage performs symmetrical padding on the boundaries of noisy input images, while in CSF [17] and TNRD [19], the same padding strategy is used before each stage. Different from the above methods, we directly pad zero before convolution to ensure that each feature map in the middle layer has the same feature size as the input image. We found that a simple zero-padding strategy does not lead to any boundary artifacts. This good performance may be attributed to the strong ability of DnCNN

C. Image Denoising Fusion of Residual Learning and Batch Normalization

The network shown in Figure 1 can either train the original map F(y) to predict x, or train the residual map R(y) to predict v. According to [29], when the original mapping is more like an identity mapping, the residual mapping will be easier to optimize. Note that the noisy observation y is more like the underlying clean image x than the residual image v (especially when the noise level is low). Therefore, F(y) is closer to an identity map than R(y), and the residual learning formula is more suitable for image denoising

Under the same settings of gradient-based optimization algorithm and network architecture, the average PSNR values ​​obtained using these two learning formulations with/without batch normalization are shown in Fig. 2. Note that two gradient-based optimization algorithms are used in this paper: one is stochastic gradient descent with momentum (i.e., SGD), and the other is the Adam algorithm [37]. First, we can observe that the residual learning formulation converges faster and more stably than the original mapping learning. Meanwhile, without batch normalization, simple residual learning using traditional SGD cannot compete with state-of-the-art denoising methods such as TNRD (28.92dB). We argue that the insufficient performance should be attributed to internal covariate shifts caused by changes in network parameters during training [28]. Therefore, the method of batch normalization is used to solve it. Second, we observe that with batch normalization, learning the residual map (red line) converges faster and achieves better denoising performance than learning the original map (blue line). In particular, both SGD and Adam optimization algorithms lead to the best results for networks with residual learning and batch normalization. In other words, it is the combination of the residual learning formulation and batch normalization, rather than the optimization algorithm (SGD or Adam), that leads to the best denoising performance.

In fact, it can be noticed that in Gaussian denoising, both residual image and batch normalization are related to Gaussian distribution. Residual learning and batch normalization are likely to mutually benefit from Gaussian denoising. The following analysis can further confirm this point:

On the one hand, residual learning benefits from batch normalization. This is simple because batch normalization provides some advantages to CNNs, such as alleviating the internal covariate shift problem. As can be seen from Figure 2, even though residual learning without batch normalization (green line) converges fast, it is not as good as residual learning with batch normalization (red line).

On the other hand, batch normalization benefits from residual learning. As shown in Figure 2, in the absence of residual learning, batch normalization even has a somewhat detrimental effect on convergence (blue line). With residual learning, batch normalization can be exploited to speed up training while improving performance (red line). Note that each mini-bath is a small image set (eg, 128). In the absence of residual learning, the input intensities and convolutional features are related to their neighboring input intensities and convolutional features, and the distribution of layer inputs also depends on the content of images in each training mini-batch. Using residual learning, DnCNN implicitly removes latent clean images

hidden layer. This makes the input to each layer Gaussian distributed with less correlation and less correlation with image content. Therefore, residual learning can also help batch normalization reduce internal covariate shift.

In summary, the fusion of residual learning and batch normalization can not only speed up and stabilize the training process, but also improve the denoising performance.

Guess you like

Origin blog.csdn.net/mytzs123/article/details/126841470