Use openai-whisper speech to text

Foreword:

Recently, due to the popularity of ChatGPT, the field of AI applications has once again entered the public's attention. Today, we introduce an AI application whisper that can convert human voice into text more accurately (supports multiple languages)

1. Installation

There are two ways to install pip and source code compilation and installation, here is the pip installation method

  1. Install python 3.9.9 and pyTouch 1.10.1 (the installation steps are omitted, just download and install from the official website). Since the pip version uses the specified pyTouch, there will be problems installing the latest python version.

python3.9.9

  1. Install ffmpeg, the following is the installation method of various OS

# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg

# on Arch Linux
sudo pacman -S ffmpeg

# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg

# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg

# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg
  1. Install wisper and rust

pip install -U openai-whisper
pip install setuptools-rust

Two, use

whisper supports both cpu and gpu. After the default is completed, only the cpu is used for acceleration

whisper.exe 屋顶.mp3 --language zh --model small

How is the effect? ​​Of course, I used jay's song as a test. The picture below is the result; obviously the effect is good in an environment where the speech speed is slow. If you change it to a nunchuck...you can't see it...

(Picture: Roof converted into lyrics)

--model indicates the model used by AI. There are 5 models in total. The larger the model, the higher the accuracy (of course, the higher the performance requirements of the device)

--language indicates the language of the voice here zh=Chinese

We can see from the screenshot that the translated text has both simplified and traditional characters. This is mainly because the AI ​​samples have both simplified and traditional characters. If we want the output to be simplified, add --initial_prompt "The following is a sentence in Mandarin."

whisper 屋顶.mp3 --language zh --model small --initial_prompt "以下是普通话的句子。"

Execute again, the result is as shown in the figure:

Use CUDA

Execute the following instructions to install pytorch with cuda

pip uninstall torch
pip cache purge
pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116

--device cuda Use the device parameter to specify cuda

whisper 屋顶.mp3 --language zh --model small --device cuda --initial_prompt "以下是普通话的句子。"

Other unfinished matters can be understood through --help

whisper --help

Note: When calling for the first time, the model will be downloaded, and the domestic direct download speed will be very slow!

references

whisper blog

whisper github

Guess you like

Origin blog.csdn.net/nikolay/article/details/128951413