I'll go, what kind of black technology is this! Suppress Transient Noise with Signal Processing Method

For speech enhancement, noise can generally be divided into steady-state noise (such as white noise) and transient noise (some places are also called non-stationary noise, such as keyboard sound). Readers who have a certain understanding of speech noise reduction will know that general signal processing methods are more effective for steady-state noise. You can refer to the analysis of the WebRTC ANR process . However, for transient noise, the noise estimation algorithm cannot be accurate due to the fast noise change The change of the noise is tracked, so the method based on deep learning is generally used to suppress the transient noise, and you can refer to DNN single-channel speech enhancement . But is it possible to suppress transient noise using signal processing? The answer is true, let’s not talk nonsense, let’s take a look at the effect first to give you a little shock from signal processing.

The transient noise suppression flow chart is shown below. Compared with the traditional process, there is an additional transient spectrum estimation module in addition to the noise spectrum estimation.

picture

I. Transient Noise Estimation

We first assume that the transient noise changes more rapidly than speech and other noises, and then adjust the noise estimation algorithm so that the non-transient parts (speech and steady-state noise) present a "pseudo-stationary" characteristic. The audio is divided into transient components and non-transient components. Then, based on the PSD of the non-transient components, the gain of the magnitude spectrum of the transient components is calculated by OMLSA to obtain the transient components and suppress the non-transient components including speech and background noise.

Because it is transient noise estimation, compared with the general noise estimation algorithm, we use a shorter frame length, so the frame length is selected as 64 sampling points, that is, at the sampling rate of 16kHz, the frame length is 4ms. We first estimate the non-transient components using an MCRA-based algorithm. The noise PSD estimate is obtained by time-recursive averaging of the spectral magnitude, i.e.

picture

Among them, alpha_s determines the speed of noise tracking. A smaller alpha_s gives a greater weight to the current frame, so the noise update will be faster. Generally, 0.7~0.99 is selected. The transient noise presence probability is governed by the minimum of the smoothing period obtained from a finite causal window of length L

picture

Then, the existence probability of transient noise is judged by the following formula

picture

Among them, delta is an empirical value. When the result is greater than delta, we think that the current frame state is transient noise, which is recorded as

picture

Then the existence probability of transient noise can be expressed as

picture

At this time, the power spectrum of the non-transient component is

picture

in

picture

The above formulation can track non-stationary components, however phoneme onsets, which are characterized by sudden bursts, cannot be tracked by spectral recursive smoothing, which can cause phoneme onsets to be misinterpreted as transient noise. So we add "future" information to the calculation, so that we can distinguish the transient from the onset of speech.  For transient noise, the power of the signal is expected to decay rapidly after a brief transient, whereas after the onset of a speech phoneme, the power level is expected to remain stable for the duration of the phoneme . We distinguish in the following way

picture

It is worth noting that this is a non-causal window, and the window length is smaller than the window length of the noise estimation. Generally, 40ms is enough to cover most transient noises. At this point, the non-transient component can be expressed as

picture

Finally, we use an OMLSA-based algorithm to obtain the transient component, and the estimated value of the transient PSD is as follows

picture

where T is the amplitude estimate calculated by the OMLSA algorithm.

II. Speech Enhancement

After having the above information, we can calculate the corresponding gain. First, the total noise PSD is the sum of two parts, namely the steady-state noise obtained by MCRA and the transient noise obtained by formula (10), as shown below

picture

The spectral gain can be calculated by

picture

where Gmin is a fixed constant gain when speech is absent and GH1 is

picture

in

picture

III. Summary

In general, the estimation of transient noise uses a very tricky method, which regards speech and steady-state noise as "noise", and then uses OMLSA to calculate the gain to obtain the remaining frequency components, and indirectly estimates the transient noise. PSD. When you have the noise PSD, it is easy to find the corresponding gain. Finally, let's look at the estimated transient noise. The figure above is the original transient noise, and the figure below is the estimated transient noise. It can be seen that the estimate is relatively accurate. Of course, some phonemes can also be seen from the formants. for transient noise.

picture

references:

[1]. https://zhuanlan.zhihu.com/p/591219373?utm_id=0

[2]. https://israelcohen.com/wp-content/uploads/2018/05/IWAENC2012_Hirszhorn.pdf

Guess you like

Origin blog.csdn.net/weixin_48827824/article/details/132086501