Echo Cancellation (AEC) Principles, Algorithms, and Practice—Frequency Domain Block LMS Adaptive Filtering Algorithm (FDAF)

        Both linear convolution and linear correlation in the block LMS adaptive filtering algorithm can be realized by fast Fourier transform (FFT). Therefore, the effective implementation method of the block LMS adaptive filtering algorithm is actually to use the FFT algorithm to complete the adaptation of the filter coefficients in the frequency domain. The block LMS adaptive filtering algorithm implemented in this way is called a frequency-domain block LMS adaptive filtering algorithm (FDAF, Frequency-Domain Block Least MeanSquare Adaptive Filter).

        On the other hand, the overlap-store method and overlap-add method commonly used in digital signal processing provide powerful tools for fast convolution operations, that is, linear convolutions are computed using discrete Fourier transforms. The overlapping storage method is the more commonly used method among the two methods of non-adaptive filtering. Also, it is worth noting that although the filter can be implemented with any amount of overlap, it is most efficient when overlapped at 50%.

 Signal Flowchart of Frequency Domain Block LMS Adaptive Filtering Algorithm Based on Overlapping Memory Method

 

Problem: When the echo path is very long and complex, and the echo delay is high, the computational complexity of the time-domain adaptive filtering algorithm is high.

Solution: A frequency-domain block filter (FDAF) algorithm is proposed. The FDAF algorithm divides the adaptive filter with a length of L into sub-blocks that are integer multiples of the FFT length, and performs an LMS algorithm in the frequency domain for each sub-block of the input signal.

Advantages: When the echo path is long and complex, the amount of calculation is small, and the convergence speed is slightly improved.
  FDAF has many advantages over time domain. In addition to being able to perform filter convolution by multiplying in the frequency domain, this transform effectively reduces the length of the adaptive filter. Therefore, the computational complexity of the adaptive algorithm is reduced. In addition, FDAF can also improve the convergence speed in order to reduce the computational complexity. This is due to the reduction in the distribution of eigenvalues ​​of the autocorrelation matrix of the signal in the filter update.

  These advantages of FDAF ultimately come with trade-offs. The major costs of FDAF are increased latency and increased memory requirements. The latency cost comes from the need to delay the desired signal (or the microphone signal in echo cancellation) through a frequency domain filter. This results in increased memory storage over time-domain methods, since both the excitation and desired signals need to be stored. Early methods of FDAF made the order of the FFT roughly the same size as the impulse response. However, as mentioned earlier, applications such as echo cancellation have long echo paths, resulting in large delays and storage requirements. This shortcoming can be overcome by methods such as multiple delay adaptive filters. In this approach, the block size can be smaller than the required time-domain adaptive filter, and an adaptive filter in each frequency bin can be applied instead of a single coefficient. Therefore, the disadvantages of FDAF can be mitigated while maintaining reduced computational complexity and increased convergence speed.

code show as below:

import numpy as np
import librosa
import soundfile as sf
import pyroomacoustics as pra

from scipy.linalg import hankel


from numpy.fft import rfft as fft
from numpy.fft import irfft as ifft

def fdaf(x, d, M, mu=0.05, beta=0.9):
  H = np.zeros(M+1,dtype=np.complex128)
  norm = np.full(M+1,1e-8)

  window =  np.hanning(M)
  x_old = np.zeros(M)

  num_block = min(len(x),len(d)) // M
  e = np.zeros(num_block*M)

  for n in range(num_block):
    x_n = np.concatenate([x_old,x[n*M:(n+1)*M]])
    d_n = d[n*M:(n+1)*M]
    x_old = x[n*M:(n+1)*M]

    X_n = fft(x_n)
    y_n = ifft(H*X_n)[M:]
    e_n = d_n-y_n
    e[n*M:(n+1)*M] = e_n

    e_fft = np.concatenate([np.zeros(M),e_n*window])
    E_n = fft(e_fft)

    norm = beta*norm + (1-beta)*np.abs(X_n)**2
    G = mu*E_n/(norm+1e-3)
    H = H + X_n.conj()*G

    h = ifft(H)
    h[M:] = 0
    H = fft(h)

  return e

# x 原始参考信号
# v 理想mic信号 
# 生成模拟的mic信号和参考信号
def creat_sim_sound(x,v):
    rt60_tgt = 0.08
    room_dim = [2, 2, 2]

    e_absorption, max_order = pra.inverse_sabine(rt60_tgt, room_dim)
    room = pra.ShoeBox(room_dim, fs=sr, materials=pra.Material(e_absorption), max_order=max_order)
    room.add_source([1.5, 1.5, 1.5])
    room.add_microphone([0.1, 0.5, 0.1])
    room.compute_rir()
    rir = room.rir[0][0]
    rir = rir[np.argmax(rir):]
    # x 经过房间反射得到 y
    y = np.convolve(x,rir)
    scale = np.sqrt(np.mean(x**2)) /  np.sqrt(np.mean(y**2))
    # y 为经过反射后到达麦克风的声音
    y = y*scale

    L = max(len(y),len(v))
    y = np.pad(y,[0,L-len(y)])
    v = np.pad(v,[L-len(v),0])
    x = np.pad(x,[0,L-len(x)])
    d = v + y
    return x,d

if __name__ == "__main__":
    x_org, sr  = librosa.load('female.wav',sr=8000)
    v_org, sr  = librosa.load('male.wav',sr=8000)

    x,d = creat_sim_sound(x_org,v_org)

    e =  fdaf(x, d,M=256,mu=0.1)

    sf.write('x.wav', x, sr, subtype='PCM_16')
    sf.write('d.wav', d, sr, subtype='PCM_16')
    sf.write('fdaf.wav', e, sr, subtype='PCM_16')

references:

https://blog.csdn.net/qq_34218078/article/details/108666894?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522168345854216800215089748%2522%252C%2522scm%2522%253A%252220140713.130102334.pc%255Fblog.%2522%257D&request_id=168345854216800215089748&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_ecpm_v1~rank_v31_ecpm-2-108666894-null-null.article_score_rank_blog&utm_term=%E5%9B%9E%E5%A3%B0&spm=1018.2226.3001.4450

https://www.bilibili.com/video/BV16U4y1z7Pq/?spm_id_from=333.999.0.0&vd_source=77c874a500ef21df351103560dada737

Guess you like

Origin blog.csdn.net/qq_42233059/article/details/130546590