[Audio Processing] Introduction to Fast Convolution Fast Convolution Algorithm

Series Article Directory



1. Convolution

Before discussing fast convolution, let's talk about "convolution". The mathematical definition of convolution is very simple, two discrete signals x [ n ] x[n]x [ n ] andh [ n ] h[n]The convolution of h [ n ] is:
x [ n ] ∗ h [ n ] = ∑ k = − ∞ ∞ x [ k ] h [ n − k ] = y [ n ] , n ∈ Z x[n] * h [n]=\sum_{k=-\infty}^{\infty} x[k] h[nk]=y[n], \quad n \in \mathbb{Z}x[n]h[n]=k=x[k]h[nk]=and [ n ] ,nHow does Z
intuitively understand this formula? To be honest, I read a lot of information, such as:

When I read these materials, I seemed to understand it, but after a few days, when I looked back at the convolution formula, I couldn't tell what it was doing. But it doesn't matter, today's topic only needs to have a general feeling about convolution.

Convolution is very important in signal processing. How to "know" a filter in [Audio Processing]? In we mentioned:

Signal x ( n ) x(n)x ( n ) gets y ( n ) y(n)after passing through LTI filtery ( n ) , this process can be expressed as an impulse responseh ( n ) h(n)h ( n ) andx ( n ) x(n)x ( n )的卷美:
y ( n ) = ( h ∗ x ) ( n ) y(n) = (h*x)(n)and ( n )=(hx)(n)

In layman's terms, suppose we have a signal system now, it is a black box, we don't know what its function is, we only know that when one signal is input, it will spit out another signal. Now you have a task, you have 1000 input signals, and you want to get the output signal given by this black box, what will you do at this time?

Method 1, I will work a little harder, slowly input 1000 pieces of data, and then get output, which is very time-consuming.
The second method is to be keenly aware that the process of processing the input signal x(n)through the black box is actually a convolution process, so as long as we can obtain the impact response of the black box system h(n), then the remaining work is x(n)to h(n)convolve and on the computer. OK.

Obviously, the second method is efficient and smart. No matter how many input signals there are, the impact response of the black box h(n)remains unchanged. We must grasp this invariant.

So, h(n)how to ? Very simple, just need to input an ideal unit impulse signal δ ( t ) \delta(t) to this systemδ ( t ) will do. Of course, in the real world, there is no idealδ ( t ) \delta(t)δ ( t ) can usually be replaced by short and high-energy signals, such as thunder, gunshots, and the sound of hitting a board, etc.

In the field of audio processing, reverb effects can also be achieved by convolution. For example, we can play a short and high-energy sound in the "bathroom" space, and record the echo audio of the space, which gives the space h(n). Next, if you want a "bathroom" reverb effect, just h(n)convolve .

2. Fast convolution

Convolution is very important and has a wide range of applications, but it has a fatal flaw: the computational complexity is too high. Let's take the reverberation implementation mentioned above as an example. The recorded h(n)length is N and the input signal has M. Then the reverberation effect needs to be obtained through the convolution algorithm:

  1. Each sample of the output requires N multiplications
  2. Each sample of the output requires N additions
  3. The output length is N, so there are multiplications N * M times, all additions are N * M times, and the algorithm complexity is O(N*M)

If the audio sampling rate is 44100, then 1s h(n), to process 1min input audio, then a total of 44100 * 60 * 44100 multiplication and 44100 * 60 * 44100 addition are required. This amount of calculation will take a long time even for high-end PCs. Taking my personal mac m1 as an example h(n), x(n)for an input of 4s and 25s, it takes about 20s to perform convolution using numpy's convolve.

The convolution algorithm is too slow, so people invented the Fast Convolution algorithm to increase the speed of convolution. Next, I will introduce various fast convolution algorithms. I mainly refer to the following two articles. It is recommended that students who have spare energy can read the original text:

All the following codes can be found in github - fast_convolution , where the python directory contains the python implementation, the C++ implementation is in the src directory, and there are examples of different convolution implementations in the example directory.

2.1 FFT Convolution

Students who have studied signal processing a little bit should know that "convolution in the time domain is a product in the frequency domain". h(n)Using x(n)signal H ( ω ) H(\omega)H ( ω )X ( ω ) X(\omega)X ( ω ) , then multiply to getY ( ω ) Y(\omega)Y ( ω ) , and finallyY ( ω ) Y(\omega)Y ( ω ) can be transformed from the frequency domain to the time domain by IFFT. The flow chart is as follows:
insert image description here
In the above process, there are several issues that need to be considered:

  1. h(n)x(n)The length is usually not the same, but in the frequency domain multiplication operation, we need H ( ω ) H(\omega)H ( ω )X ( ω ) X(\omega)X ( ω ) are of the same size. Therefore, it is necessary to pad and to a length of Kh(n)in the form of zero-padding.x(n)
  2. So how big is K? Section 2.5.2 of this article tells us that it must satisfy K ≥ M + N − 1 K ≥ M + N − 1KM+N1
  3. So what is y(n)the length ? In our implementation the length is M+N-1, but there are different implementations, refer to the mode parameter in numpy-convolution .

After thinking about the above problems, our fft convolution algorithm is ready to come out. The python code is as follows:

def pad_zeros_to(x, new_length):
    output = np.zeros((new_length,))
    output[:x.shape[0]] = x
    return output

def next_power_of_2(n):
    return 1 << (int(np.log2(n - 1)) + 1)

def fft_convolution(x, h, K=None):
    Nx = x.shape[0]
    Nh = h.shape[0]
    Ny = Nx + Nh - 1
    
    if K is None:
        K = next_power_of_2(Ny)
        
    X = np.fft.fft(pad_zeros_to(x, K))
    H = np.fft.fft(pad_zeros_to(h, K))
    
    Y = X*H
    
    y = np.real(np.fft.ifft(Y))
    
    return y[:Ny]

The code is simple:

  1. First calculate the size of K and satisfy the condition of K >= M + N -1. FFT usually supports the power of 2 better, so we also take a next_power_of_2to ensure that K is the power of 2
  2. Call pad_zeros_toto fill andx to K size, then call fft to calculate the frequency domain result, and then do frequency domain multiplicationhY = X*H
  3. Finally, use ifft to convert from the frequency domain back to the time domain to get the final convolution result

2.2 Block Convolution

Although FFT convolution is already fast, it needs to input x(n)all data, so there are two problems:

  1. x(n)If it is very long, the FFT calculation will consume a lot of memory resources, which is very unfriendly to mobile terminals or machines with weak computing power
  2. In actual audio and music scenarios, we require real-time performance, which means we cannot get all of them x(n). For example, in a live broadcast scenario, the audio stream can be considered to be infinitely long.

For this reason, people have proposed block convolution. For example, when 256/64/32 data are hoarded, a convolution is performed, which meets the real-time requirements and can control the algorithm delay at a low level. .

2.2.1 Overlap-Add Block Convolution

We first introduce the overlap-add blockwise convolution algorithm, which convolves each incoming block with the full filter using an FFT-based convolution method, stores an appropriate number of past convolution results, and compares them with our processed Outputs the sum of its subsequent parts in blocks of the same length as the input signal. The processing flow chart is as follows and the code is as follows:
insert image description here

def overlap_add_convlution(x, h, B, K=None):
    M = len(x)
    N = len(h)
    
    num_input_blocks = np.ceil(M/B).astype(int)
        
    output_size = M + N - 1
    y = np.zeros((output_size,))
    
    for n in range(num_input_blocks):
        xb = x[n*B:(n+1)*B]
        
        u = fft_convolution(xb, h, K)
        
        y[n*B:n*B + len(u)] += u
    
    return y

In the above implementation, for convenience, I did not implement the real-time version of the algorithm, but you have to be sure that it can be done. When the Overlap add block convolution finally gets the result, an overlap add is performed, which is very similar to the overlap operation in STFT.

2.2.2 Overlap-Save block convolution

An important disadvantage of the Overlap-Add scheme is that the computed partial convolutions must be stored and aggregated. Is there a way we can optimize it?

Another method is called Overlap-Save. It is based on storing an appropriate number of input blocks rather than output blocks, which are shorter than the computed convolution. Chunks of input data are stored in a buffer of length K in a first-in, first-out fashion, that is, each new chunk of input samples shifts all previously stored samples in the buffer, discarding the oldest B samples. Then, we perform a K-point-based FFT convolution with zero-padded filter coefficients. The processing flow chart is as follows and the code is as follows:
insert image description here

def overlap_save_convolution(x, h, B, K=None):
    M = len(x)
    N = len(h)

    if K is None:
        K = next_power_of_2(B + N - 1)
        
    # Calculate the number of input blocks
    num_input_blocks = np.ceil(M / B).astype(int) \
                     + np.ceil(K / B).astype(int) - 1

    # Pad x to an integer multiple of B
    xp = pad_zeros_to(x, num_input_blocks*B)

    output_size = num_input_blocks * B + N - 1
    y = np.zeros((output_size,))
    
    # Input buffer
    xw = np.zeros((K,))

    # Convolve all blocks
    for n in range(num_input_blocks):
        # Extract the n-th input block
        xb = xp[n*B:n*B+B]

        # Sliding window of the input
        xw = np.roll(xw, -B)
        xw[-B:] = xb

        # Fast convolution
        u = fft_convolution(xw, h, K)

        # Save the valid output samples
        y[n*B:n*B+B] = u[-B:]

    return y[:M+N-1]

The Overlap-Save scheme is easier to implement in real-time scenarios and has better performance than Overlap-Add. But the sliding window part is more complicated to implement.

3. Evenly divided convolution algorithm

If the filter we want to convolve with the input also has a considerable length (for example, when it represents the impulse response of a hall with a rather long reverberation time), we may want to partition both and .x(n) h(n)What I want to introduce here is the "Uniform Partition Convolution Algorithm". The flow chart of its processing is as follows and the code is as follows:
insert image description here

def realtime_uniformly_partitioned_convolution(x, h, B):
    M = len(x)
    N = len(h)
    P = np.ceil(N/B).astype('int')
    num_input_block = M//B
    
    print('num_input_block:',num_input_block)
    
    output = np.zeros(M)
    
    # precalculate sub filters fft
    sub_filters_fft = np.zeros((P, 2*B), dtype=np.complex64)
    for i in range(P):
        sub_filter = h[i*B:i*B + B]
        sub_filter_pad = pad_zeros_to(sub_filter, 2*B)
        sub_filters_fft[i,:] = np.fft.fft(sub_filter_pad)
        
    
    # input blocks
    freq_delay_line = np.zeros_like(sub_filters_fft)
    xw = np.zeros(2*B)
    for i in range(num_input_block):
        block_x = x[i*B:i*B + B]
        xw = np.roll(xw, -B)
        xw[-B:] = block_x
        
        xw_fft = np.fft.fft(xw)
        
        # inser fft into delay line
        freq_delay_line = np.roll(freq_delay_line, 1, axis=0)
        freq_delay_line[0,:] = xw_fft
        print(freq_delay_line)

        # ifft
        s = (freq_delay_line*sub_filters_fft).sum(axis=0)
        # print(s)
        ifft = np.fft.ifft(s).real[-B:]
        # print(o)
        output[i*B:i*B + B] = ifft
        
    return output

Above, all the code can be found in github - fast_convolution . In addition, there is a C++ version of the implementation. In the example directory, there is an example of a real-time playback scene with uniform segmentation convolution. You only need to modify input_filethe impulse_filepaths of and . About impuse_fileyou can find some at vtolani95/convolution .

3.1 Reverb implementation

The following video is the effect of reverberation using uniform segmentation convolution:

fast_conv_reverb

In the specific implementation, in order to control the volume of the output audio, a variable scaleof . This variable can be used as an algorithm parameter for users to adjust.


4 Summary

In this article, we introduced the significance of convolution in the signal system. The complexity of the convolution algorithm is O(N^2). In order to speed up the convolution calculation, people proposed a fast convolution algorithm. This article introduces the FFT convolution , Overlap-Add and Overlap-Save block convolutions, and evenly partitioned convolution algorithms. The relevant implementations of the algorithm are in github - ast_convolution , including python version and C++ version.

5 Reference

Guess you like

Origin blog.csdn.net/weiwei9363/article/details/125358452