Echo Cancellation (AEC) Principles, Algorithms, and Practices——Introduction to the Modules Involved in the Complete Echo Cancellation Algorithm Framework

1. A complete echo cancellation system includes the following modules:
1. Time Delay Estimation (TDE) module
2. (Linear) echo cancellation (Linear Acoustic Echo Cancellation, AEC) module (linear filtering module)
3 .Double-Talk Detect (DTD) module
4. Nonlinear residual acoustic echo suppression (Residual Acoustic Echo Suppression, RAES) module (nonlinear filtering module)

 2. Delay estimation module

Reasons for the delay: the reference signal is taken from the data received in the downlink, and the microphone signal is taken from the collected data. There is a delay in the middle, sound propagation time (regardless of whether it is inside or outside), the buffer for playing the work of the collection thread, and the start-up time difference. Different devices and different environments have different delays.

The implemented echo cancellation technology mainly uses two kinds of sound signals, one is the echo signal and the other is the reference signal. In the actual implementation of echo cancellation, there is a certain delay between the two signals . There are three main sources of delay :

1. The time delay generated from acquiring the reference signal to when the reference signal is played out from the loudspeaker.

2. The delay caused by the reference signal arriving at the microphone after it is broadcast from the speaker.

3. After the microphone obtains the echo signal, it sends it to the time delay generated by the echo cancellation algorithm module.

The purpose of the delay estimation module is to control the delay between the echo signal and the reference signal to keep it within a certain range, which is convenient for the following modules to process the echo.

Delay Estimation Module Impact

Delay alignment can reduce the pressure on the adaptive filter, reduce the tracking length of the filter, and reduce overhead. If there is no delay alignment module, the length of the filter tracking needs to be designed to cover all reference signals to the echo signal. The delay, which is often hundreds of milliseconds, is very computationally intensive.

Alignment of delays affects filter performance. If the time delay is not aligned, the correlation between the reference signal and the echo signal tracked by the filter is extremely low, and the filter convergence will be affected. The delay is overestimated, the reference signal cannot be found in the signal buffer tracked by the filter, and the filter cannot converge.

The alignment speed of the delay affects the overall convergence speed, affects the filter convergence, and also affects the nonlinear echo processing. When the delay changes, it is necessary to be able to quickly track the change of the delay and adjust it quickly, otherwise echoes will occasionally appear.

Latency Estimation Design

Generally speaking, due to the response of the speaker and microphone of the device, the distribution of the echo is roughly in the mid-frequency band, and there is very little echo in the high-frequency and low-frequency parts, so the echo can be tracked in the mid-frequency band.

The aec module of Webrtc uses the method of Binary Spectrum in the frequency domain. Map the distribution of the middle frequency band of the spectrum at both ends to the binarized data, find the far-end signal with the highest similarity and calculate the corresponding time delay. This method has a low amount of calculation, but is greatly affected by noise.

The aec3 module of Webrtc uses linear filtering. The matched filter method is to directly perform NLMS (Normalized Least Mean Square) processing on the time domain signal, which is very robust.

There will be multiple peaks in the cross-correlation of time-domain signals, resulting in inaccurate delay estimation. We consider using the method of frequency-domain cross-correlation, combined with linear filtering, on the one hand to ensure robustness, on the other hand to use the characteristics of fast detection speed of frequency-domain cross-correlation to speed up the detection speed. 

3. (Linear) echo cancellation (Linear Acoustic Echo Cancellation, AEC) module (linear filter module)

The echo cancellation module is mainly designed using an adaptive filter, and related indicators such as stability, algorithm complexity, and convergence rate need to be considered during design. In order to achieve a better echo cancellation effect, in addition to the adaptive filter, when designing the linear echo cancellation module, it is also necessary to introduce dual-channel detection.

x(n) is the far-end input signal, through the unknown echo path h(n) to get y(n)=x(n)∗h(n), plus the observation noise v(n), to get the near-end input signal s(n)=y(n)+v(n). x(n) obtains the estimated echo signal through the adaptive filter w(n), and subtracts it from the near-end input signal s(n) to obtain the error signal e(n), that is, e(n)=s(n)− conj(w(n))x(n), the closer the echo path estimated by the adaptive filtering algorithm is to the actual echo path, the smaller the error and the smaller the echo residue. So use the obtained error size as the direction of adaptive adjustment.

The filter uses a specific adaptive algorithm to continuously adjust the weight vector, so that the estimated echo path w(n) gradually approaches the real echo path h(n). In this way, the output of the filter can approximate the real echo, so that there is no echo signal in the error signal.

In the convergence stage of the adaptive filter, the near-end signal has only the echo, and the near-end voice cannot be mixed. Because the near-end and noise will disturb the convergence process of h(n). That is to say, the echo cancellation algorithm is required to converge very quickly after it starts running. It is best to require the other party's algorithm to converge as soon as you speak. After the convergence is complete, if the other party starts talking, the h(n) coefficient should not change. Need to stabilize.

The echo path may change. Once there is a change, the echo cancellation algorithm must be able to judge it, because the adaptive filter learning needs to be restarted, that is, h(n) needs a new convergence process to approach the new echo path h. Adaptive filters need to strike a balance between convergence speed and tracking performance, steady-state misadjustment.

Linear Filter Design

Adaptive filter, NLMS filter and Kalman filter are commonly used at present, and these two filters have their own advantages and disadvantages. Kalman filtering has a fast convergence speed and lacks convergence performance; NLMS is relatively stable. Basically, there are different biases in convergence speed, tracking performance, and steady-state misalignment. But no matter what kind of filter, you can adjust the tracking speed to change the balance between convergence speed and tracking performance. NLMS changes the step size value, and Kalaman changes the gain.

Consider the mixed use of multiple filters, take the advantages of different filters, ensure the convergence speed, and also limit the divergence of each other. The nonlinear filter will also participate in estimating the status of the echo and double talk, thereby controlling the tracking step size of the adaptive filter, and realizing variable step tracking according to the status.

4. Nonlinear residual acoustic echo suppression (Residual Acoustic Echo Suppression, RAES) module (nonlinear filtering module)

It is difficult for the adaptive filter to completely eliminate the echo. In order to eliminate the residual echo, it is necessary to introduce a residual echo suppression module. When we design the residual echo suppression module, we not only need to achieve a balance between near-end speech distortion and residual echo suppression, but also need to balance the algorithm effect and computational complexity.

The nonlinear processing module usually calculates the correlation among the reference signal, the microphone signal, the linear echo signal and the residual, and estimates the residual echo or the state of the echo. Echo elimination by means of Wiener filtering focuses on how to estimate the residual echo, and the estimation of the size of the residual echo also directly affects the final elimination result. If the residual echo is underestimated, there may be residual echoes; if the residual echo is overestimated, the near-end voice will be damaged in the double-talk.

Nonlinear Filter Design

For the reference signal, microphone signal, linear echo signal and residual signal, our nonlinear processing module adopts the characteristics of peak correlation, frequency domain correlation, and amplitude similarity to jointly judge and obtain information such as double-talk status and echo status . In addition to the above correlation, the update weight of the linear filter can also be used to obtain the state of the echo. What affects the final output is the residual echo or the estimation of the size of the echo. The estimation of the residual echo adopts the method of combining the state of the echo and the ERL (Echo Return Loss) estimation of the filter.

The adaptive filter has three working modes (detected by DTD):

  • Far-end speech exists, near-end speech does not exist: filtering, adaptive filter coefficient update
  • Far-end speech present, near-end speech present: filtering
  • Far end voice does not exist: do nothing

references:

https://worktile.com/kb/ask/5708.html

https://www.cnblogs.com/LXP-Never/p/11703440.html

Guess you like

Origin blog.csdn.net/qq_42233059/article/details/131959160