[The Charm of Python]: Teach you how to implement text speech recognition with a few lines of code

Article directory

- introduction

introduction

Speech recognition technology, also known as automatic speech recognition, aims to use computers to automatically convert human speech content into corresponding text and text into speech.

1. Operation effect

Python speech recognition

2. Convert text to speech

2.1 Using pyttsx3

pyttsx3 is a popular Python third-party library for text-to-speech (TTS) conversion. This library supports multiple operating systems, including Windows, Linux, and macOS, and can work without an internet connection because it uses a local speech engine installed on your computer.

main feature :

Cross-platform : Can run on different operating systems.

Works offline : does not depend on an internet connection.

Multiple voices and languages : Supports multiple voice and language options.

Custom settings : Allows users to adjust parameters such as speaking speed, volume, and intonation.

Simple and easy to use : With an intuitive API, it is easy to integrate and use.

Install :

pip install pyttsx3 -i https://pypi.tuna.tsinghua.edu.cn/simple some-package

[Example] : Use pyttsx3 to convert text to speech

import pyttsx3 as pyttsx

engine = pyttsx.init()  # 初始化引擎
engine.say('独断万古荒天帝, 唯负罪州火桑女')  # 添加文本到语音队列
engine.runAndWait()  # 开始语音输出

2.2 Use SAPI to convert text to speech

In python, you can also use SAPI for text-to-speech conversion.
In Python, the win32com library is a module used to interact with COM (Component Object Model) components in the Windows operating system. The win32com.client module provides a Python interface for using COM automation. Any Windows application or service that supports COM automation can be accessed and controlled through the win32com.client.Dispatch method.
For SAPI (Speech Application Programming Interface), its functions can be accessed through the win32com library to achieve text-to-speech (TTS) and speech recognition.

[Example] : Use SAPI to convert text to speech

from win32com.client import Dispatch

msg = "独断万古荒天帝, 唯负罪州火桑女"
speaker = Dispatch('SAPI.SpVoice')  # 创建SAPI的语音引擎实例
speaker.Speak(msg)  # 将文本转换为语音并朗读
del speaker  # 删除 speaker 对象，释放与之关联的资源。

2.3 Use SpeechLib to convert text to speech

SpeechLib is a COM library for speech functions provided by Microsoft. It allows developers to develop text-to-speech (TTS) and speech recognition on the Windows platform. With SpeechLib, you can control various properties of the speech engine, such as speaking speed, volume, intonation, and the speech library used.
Using SpeechLib, you can take input from a text file and convert it into speech.

To use SpeechLib, you need to install a third-party library: comtypes

Installation command :

pip install comtypes -i https://pypi.tuna.tsinghua.edu.cn/simple some-package

[Example] : Use SpeechLib to convert text to speech
demo file :
Insert image description here

from comtypes.client import CreateObject
from comtypes.gen import SpeechLib  # 导入 SpeechLib
engine = CreateObject("SAPI.SpVoice")  # 创建 SAPI.SpVoice 对象的实例
stream = CreateObject("SAPI.SpFileStream")  # 创建 SAPI.SpFileStream 对象的实例
infile = 'demo.txt'
outfile = 'demo_audio.wav'
stream.Open(outfile, SpeechLib.SSFMCreateForWrite)  # 输出文件，准备写入音频数据
engine.AudioOutputStream = stream  # 音频输出流设置为 stream 对象
f = open('demo', 'r', encoding='utf-8')  # 打开输入文本文件
TheText = f.read()  # 读取文件
f.close()  # 关闭文件
engine.speak(TheText)  # 使用语音引擎将文本转换为语音并输出。
stream.close()  # 关闭音频流，完成音频文件的写入

Insert image description here

3. Convert speech to text

3.1 Use PocketSphinx to convert speech to text

PocketSphinx is a lightweight speech recognition library, which is a subset of the CMU Sphinx open source speech recognition system. Developed at Carnegie Mellon University, CMU Sphinx is a powerful and flexible speech recognition system. PocketSphinx is particularly suitable for embedded systems and mobile devices because it is small and fast while providing relatively high recognition accuracy.

Key features of PocketSphinx include :

Lightweight : suitable for resource-constrained environments such as mobile devices and embedded systems.

Real-time performance : Able to achieve real-time speech recognition.

Easy to use : Provides a simple API to facilitate developers to quickly integrate and use.

Customizable : allows developers to customize language models and acoustic models as needed.

Required third-party modules: PocketSphinx and SpeechRecognition
Installation commands :

pip install PocketSphinx -i https://pypi.tuna.tsinghua.edu.cn/simple some-package
pip install SpeechRecognition -i https://pypi.tuna.tsinghua.edu.cn/simple some-package

[Example] : Use PocketSphinx to convert speech to text

import speech_recognition as sr

audio_file = 'demo_audio.wav'
r = sr.Recognizer()
with sr.AudioFile(audio_file) as source:
    audio = r.record(source)
try:
    # print('文本内容：',r.recognize_sphinx(audio,language="zh_CN"))
    print('文本内容：', r.recognize_sphinx(audio))
except Exception as e:
    print(e)

If you encounter problems when using PocketSphinx, such as initialization failure, you need to check:

Have you installed pocketsphinx correctly?
Are appropriate language models and dictionaries available?
Have sufficient permissions to access the required files.
Whether the system meets the operating requirements of PocketSphinx.

After installing speech_recognition, Chinese is not supported. You need to download the corresponding Mandarin language model and language model from the Sphinx speech recognition toolkit.
Download link:

https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/
Place the downloaded Mandarin admission and language model in the installation Python\Lib\site-packages\speech_recognition\pocketsphinx-data directory