Github
简介
Kokoro是一个开放权重的 TTS 模型,拥有 8200 万个参数。尽管它采用轻量级架构,但其质量却可与大型模型媲美,而且速度更快、成本更低。凭借 Apache 许可的权重,Kokoro 可部署在从生产环境到个人项目的任何地方。
Kokoro 模型
中文模型
git clone https://huggingface.co/hexgrad/Kokoro-82M-v1.1-zh
安装依赖
依赖包 | 作用 |
---|---|
kokoro | 主要的 TTS 语音合成库 |
torch | 运行深度学习模型(kokoro 依赖 PyTorch) |
sounddevice | 直接播放音频流 |
soundfile | 处理 .wav 音频数据 |
pathlib | (标准库,无需安装) 处理文件路径 |
pip install kokoro torch sounddevice soundfile pathlib
pip install numpy==1.26.4
pip install kokoro "misaki[zh]" soundfile --index-url https://pypi.tuna.tsinghua.edu.cn/simple
完整示例
from kokoro import KModel, KPipeline
from pathlib import Path
import sounddevice as sd
import soundfile as sf
import torch
import datetime
REPO_ID = 'hexgrad/Kokoro-82M-v1.1-zh'
SAMPLE_RATE = 24000
VOICE = 'zf_001' if True else 'zm_010'
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = KModel(repo_id=REPO_ID).to(device).eval()
zh_pipeline = KPipeline(lang_code='z', repo_id=REPO_ID, model=model)
def speed_callable(len_ps):
speed = 0.8
if len_ps <= 83:
speed = 1
elif len_ps < 183:
speed = 1 - (len_ps - 83) / 500
return speed * 1.1
def ttsWav(text):
path = Path(__file__).parent
generator = zh_pipeline(text, voice=VOICE, speed=speed_callable)
current_datetime = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
f = path / f'zh_{
current_datetime}.wav'
result = next(generator)
wav = result.audio
sf.write(f, wav, SAMPLE_RATE)
def ttsPlay(text):
generator = zh_pipeline(text, voice=VOICE, speed=speed_callable)
result = next(generator)
wav = result.audio
sd.play(wav, SAMPLE_RATE, blocking=True)
sd.wait()
if __name__ == "__main__":
ttsPlay('支付宝到账6000万元')
# ttsPlay('支付宝到账8000万元')
# ttsPlay('支付宝到账9000万元')
ttsWav('支付宝到账1000万元')
- 播放 .wav 文件
afplay zh_20250329_105237.wav