Web Audio API realizes simple voice changing effect

foreword

How to realize the real-time audio sound changing effect in the webpage? Before encountering this kind of audio and video processing needs, you may think that you need to use C code to realize it. But now, with the improvement of browser performance and the richness of web APIs, audio data can be manipulated to achieve many complex effects through native browser APIs, providing more options for web audio development. The following introduces several solutions that have been tried in the process of using the native Web Audio API to realize the sound changing effect. Interested students will learn about it together.

Note: The scope of this article is the variable speed and pitch change scheme in the voice changing scene. There are two other scenarios: variable speed without changing pitch, and students who need to change pitch without changing speed, please refer to the link or other solutions

Introduction to Web Audio API

Before starting, let’s briefly understand the Web Audio API . The Web Audio API provides a set of APIs for operating audio on the web, allowing developers to choose audio data sources, add effects to audio, visualize sound, and add spatial effects to sound. . The audio input stream can be understood as a set of buffers, and the source can be generated by reading audio files into memory AudioBufferSourceNode, or from the audio tag in HTML MediaElementAudioSourceNode, or from an audio stream (such as a microphone) MediaStreamAudioSourceNode. For example, to capture the sound of the microphone on your own device and connect it to the speaker:

// 创建音频上下文
const audioContext = new AudioContext();
// 获取设备麦克风流
stream = await navigator.mediaDevices
  .getUserMedia({ audio: true})
  .catch(function (error) {
    console.log(error);
  });
// 创建来自麦克风的流的声音源
const sourceNode = audioContext.createMediaStreamSource(stream);
// 将声音连接的扬声器
sourceNode.connect(audioContext.destination);

You can speak into the microphone and hear your own voice. The processing of the above-mentioned source data streams is designed as nodes (Nodes) one by one, which has the characteristics of modular routing. What kind of effect needs to be added and what kind of nodes are added. For example, one of the most common operations is to pass the input sampling data Amplify to achieve the effect of a loudspeaker ( GainNode), sample code:

// 创建音频上下文
const audioContext = new AudioContext();
// 创建一个增益Node
const gainNode = audioCtx.createGain();
// 获取设备麦克风流
stream = await navigator.mediaDevices
  .getUserMedia({ audio: true})
  .catch(function (error) {
    console.log(error);
  });
// 创建来自麦克风的流的声音源
const sourceNode = audioContext.createMediaStreamSource(stream);
// 将声音经过gainNode处理
sourceNode.connect(gainNode);
// 将声音连接的扬声器
gainNode.connect(audioContext.destination);
// 设置声音增益,放大声音
gainNode.gain.value = 2.0;

The above is only connected to the node for sound amplification. If you want to add other effects, you can continue to add nodes to connect, such as filter ( BiquadFilterNode), stereo control ( StereoPannerNode), signal distortion ( WaveShaperNode), and so on. This modular design provides a flexible way to create dynamic effects and composite audio. Does it feel like magic, where to modify (add Node) is very convenient. For example, the following shows AudioContextan Biquad filter nodeexample of creating a four-term filter node ( ):

var audioCtx = new (window.AudioContext || window.webkitAudioContext)();

// 创建多个不同作用功能的node节点
var analyser = audioCtx.createAnalyser();
var distortion = audioCtx.createWaveShaper();
var gainNode = audioCtx.createGain();
var biquadFilter = audioCtx.createBiquadFilter();
var convolver = audioCtx.createConvolver();

// 将所有节点连接在一起

source = audioCtx.createMediaStreamSource(stream);
source.connect(analyser);
analyser.connect(distortion);
distortion.connect(biquadFilter);
biquadFilter.connect(convolver);
convolver.connect(gainNode);
gainNode.connect(audioCtx.destination);

// 控制双二阶滤波器

biquadFilter.type = "lowshelf";
biquadFilter.frequency.value = 1000;
biquadFilter.gain.value = 25;

It can be seen that adding processing effects to the sound stream is like wearing a necklace, one by one, and finally get the final effect. For the effect, you can refer to the official sample voice-change-o-matic. A simple and typical web audio process is as follows:

  1. Create an audio context

  2. Create sources in an audio context — eg, oscillators, streams

  3. Create effect nodes such as reverb, biquad, pan, compressor

  4. Choose a destination for the audio, such as your system speakers

  5. Connect the source to the effect unit, and the effect output to the destination

dd002571613d92c528e1d5866df097d9.png

Realization of voice changing effect

First, review the basics of sound. Sound is a mechanical wave generated by the vibration of an object. It is often exposed to the following three characteristics:

  • Frequency : The higher the frequency, the higher the pitch; the lower the frequency, the lower the pitch.

  • Amplitude : The larger the amplitude, the greater the volume (loudness); the smaller the amplitude, the lower the volume.

  • Timbre : that is, the waveform, the main basis for distinguishing people by listening

The voice-changing effect mentioned here is to change the pitch of the voice. According to different scenes, the voice-changing effect can be divided into three types: changing speed without changing pitch, changing pitch without changing speed, and changing pitch and changing speed. Variable speed refers to elongating or shortening a voice in the time domain, while the sampling rate, fundamental frequency and formant of the sound do not change. Tone transposition refers to lowering or raising the gene frequency of the voice, the formant changes accordingly, and the sampling frequency remains unchanged. The application scenarios of various solutions are as follows:

  1. Variable speed and no tuning: The 2x and 0.5x speed playback in various video players is the principle of voice variable speed and no tuning applied; of course, the variable speed and no tuning is also applied to network jitter in VOIP. Simply put, it is When the network is bad, the player pulls less data from the network, and the data in the cache area is not enough, so the cached data is used to play slower. Conversely, if there is too much data in the buffer area, the playback will be faster. The implementation of this part can refer to the netEQ module of webrtc. Usually when using WeChat voice, you should feel that the network is particularly stuck. In order to keep the voice continuous, the voice will be deliberately slowed down.

  2. Pitch change and speed change: It is mainly used in sound effects. Raise the pitch of the voice to turn a male voice into a female voice, or turn a girl into a male voice; in addition, the pitch change constant speed is combined with other sound effect algorithms, such as EQ, reverb, tremolo and vibrato Voice-changing effects can be realized, such as loli voice and uncle voice on QQ.

  3. Variable speed and pitch change: When the sound playback rate is changed, the tone and timbre will also change accordingly. For example, anyone who has played tapes knows that pressing the fast forward function will make the sound sharper and raise the pitch, and the slow play function will make the sound thicker and lower the pitch.

The first two implementations require a deeper understanding of the field of sound knowledge. The time domain, frequency domain, and Fourier transform changes of the signal must be reviewed again. The learning cost is relatively high. The third method is used here. Good access. To change the playback rate of the sound, there are properties provided in the Web Audio API , which can set the playback rate of the audio, and use the audio context to obtain an instance. The sample code is as follows:AudioBufferSourceNodeplaybackRateAudioContext.createBufferSource

const play = ()=> {
  const audioSrc = ref("src/assets/sample_orig.mp3")
  const url = audioSrc.value
  const request = new XMLHttpRequest()
  request.open('GET', url, true)
  request.responseType = 'arraybuffer'

  request.onload = function() {
    const audioData = request.response
    const audioCtx = new (window.AudioContext || window.webkitAudioContext)();
  
    audioCtx.decodeAudioData(audioData, (audioBuffer) => {
      let source = audioCtx.createBufferSource();
      source.buffer = audioBuffer;
      // 改变声音播放速率,2倍播放
      source.playbackRate.value = 2;
      source.connect(audioCtx.destination);
      source.start(0);
    });
  }
  request.send()
}

You can adjust source.playbackRate.valuethe value to change the pitch, greater than 1 increases the pitch, and less than 1 decreases the pitch.

Although the sound changing effect is achieved, this method is only suitable for playing audio files or obtaining complete audio streams. It is not suitable for obtaining continuous input sound streams such as microphones. Similar is SoundTouchJS, which is a large The JS version implemented by the guy SoundTouchis also used to obtain the data stream of the complete audio. The author also explained accordingly. 28de5a2682905030155f409a6f915506.pngHow to deal with the real-time audio stream obtained by the microphone in the reference link. Here you can use the Web Audio APIScriptProcessorNode , which allows the use of JavaScript. Generate, process, and analyze audio. The processing flow chart is as follows: e9669fba44c7f1c8108e1f4069528b5d.pngUse it to process the real-time audio stream data to obtain slow-motion or accelerated audio stream data. The sample code is as follows:

const audioprocess = async () => {
  const audioContext = new AudioContext();

  // 采集麦克风输入声音流
  let stream = await navigator.mediaDevices
    .getUserMedia({ audio: true})
    .catch(function (error) {
      console.log(error);
    });

  const sourceNode = audioContext.createMediaStreamSource(stream);

  const processor = audioContext.createScriptProcessor(4096, 1, 1);
  processor.onaudioprocess = async event => {
    // 处理回调中拿到输入声音数据
    const inputBuffer = event.inputBuffer;
    // 创建新的输出源
    const outputSource = audioContext.createMediaStreamDestination();
    const audioBuffer = audioContext.createBufferSource();
    audioBuffer.buffer = inputBuffer;
    // 设置声音加粗,慢放0.7倍
    audioBuffer.playbackRate.value = 0.7
    audioBuffer.connect(outputSource);
    audioBuffer.start();

    // 返回新的 MediaStream
    const newStream = outputSource.stream;
    const node = audioContext.createMediaStreamSource(newStream)
    // 连接到扬声器播放
    node.connect(audioContext.destination)
  };
  // 添加处理节点
  sourceNode.connect(processor);
  processor.connect(audioContext.destination)
}

In addition, there is a pitch-changing library implemented by using Google's open source jungle, and there are various reverb effects, audio visualization and other cool functions, which are also implemented using the Web Audio API . The github link address is placed here, if you are interested You can also experience it, the screen looks like this52c954ff2317c6bf2fa6998d16d06464.png

Summarize

The above is a brief introduction and analysis of the use of Web Audio API , and several implementations of using Web Audio API to achieve simple voice changing effects. If you have any better implementation solutions, welcome to communicate in the comment area!

reference

https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_API

https://github.com/cwilso/Audio-Input-Effects

https://mdn.github.io/voice-change-o-matic/

https://github.com/cutterbl/SoundTouchJS

https://cloud.tencent.com/developer/news/818606

https://zhuanlan.zhihu.com/p/110278983

https://www.nxrte.com/jishu/3146.html

- END -

About Qi Wu Troupe

Qi Wu Troupe is the largest front-end team of 360 Group, and participates in the work of W3C and ECMA members (TC39) on behalf of the group. Qi Wu Troupe attaches great importance to talent training, and has various development directions such as engineers, lecturers, translators, business interface people, and team leaders for employees to choose from, and provides corresponding technical, professional, general, and leadership training course. Qi Dance Troupe welcomes all kinds of outstanding talents to pay attention to and join Qi Dance Troupe with an open and talent-seeking attitude.

14daa515ca2baedaafa18ab0458c0e94.png

Guess you like

Origin blog.csdn.net/qiwoo_weekly/article/details/131058576