深度学习整理篇(二)语音分段和讲话人语音文件识别

1.语音分段

具体安装工具请参考深度学习整理篇(一)

我们采用了py_speech_seg做AB角对话分割 https://github.com/wblgers/py_speech_seg 

A toolkit to implement segmentation on speech based on BIC and nerual network, such as BiLSTM


分割完后,进行语音转文字,正确转文字如下截图:

2.讲话人识别(识别这段话是谁讲的)

  1. 安装Kaldi 5.3版本

一定要安装kaldi 5.3版本,下载5.3的源码

#>cd kaldi

#>cd tools

#>cat INSTALL

#>extras/check_dependencies.sh

#>extras/install_mkl.sh

#>apt-get install sox gfortran subversio

#>make -j 4

 

#>cd ../src/

#>./configure --shared

  #>make depend -j 8

  #>make -j 8

 

 

2.安装libfvad: voice activity detection (VAD) library

https://github.com/dpirch/libfvad

[libfvad]#cd libfvad
[libfvad]# autoreconf -i
[libfvad]#./configure
[libfvad]#make
[libfvad]#make install

 

  1. 下载KaldiBasedSpeakerVerification并编译源码

https://github.com/qianhwan/KaldiBasedSpeakerVerification

#>cd KaldiBasedSpeakerVerification/mat

#>cat iepart* -> final.ie

#>cd KaldiBasedSpeakerVerification/src

#>vim makefile

#把kaldi,libfvad,KaldiBasedSpeakerVerification路径配置正确

修改代码,最后make

报下图错误,要修改源码,源码下面已经说明

The error on line 504 is straightforward to fix. It's a casting error in the return from checking to see if a file exists. By adding "(bool)" to that line 504, you will cast the return as a boolean and that solves it. Here is what the fexists bool should look like when you're done:

bool fexists(const char *filename){

        ifstream ifile(filename);

        return (bool)ifile;

}

Save that change and the code will compile after that.

#>make

2.修改test1Test.sh

#!/bin/bash
# KaldiBasedSpeakerVerification
# test1Test.sh
# ========================================
# Author: Qianhui Wan
# Version: 1.0.0
# Date   : 2018-01-23
# ========================================
# The following lines will setup the path to each lib
# path to kaldi/src/lib
export LD_LIBRARY_PATH=/home/qianhuiwan/sourcecodes/kaldi/src/lib:$LD_LIBRARY_PATH
# path to altas
export LD_LIBRARY_PATH=/usr/lib64/atlas:$LD_LIBRARY_PATH
# path to openfst
export LD_LIBRARY_PATH=/home/qianhuiwan/sourcecodes/kaldi/tools/openfst/lib:$LD_LIBRARY_PATH
# path to usr/local/lib
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH


../src/identifySpeaker ./example_data/test/174/174-168635-0000.wav

3.运行示例

先后运行examples目录的

#>test1Enroll.sh

#>test1Test.sh

3.说话人识别(第二种方案)

3.1在centos 7.8安装

使用韩国人写的说话人识别源码

https://github.com/jymsuper/SpeakerRecognition_tutorial

使用pip3进行安装所需软件:

pytorch 1.0.0
pandas 0.23.4
numpy 1.13.3
pickle 4.0
matplotlib 2.1.0

pip3 install wheel
pip3 install torch

pip3 install torchvision
pip3 install pandas -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com
pip3 install librosa -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com

pickle不用安装了,因为已经包含在python3.7当中。

pip install pickle not required for python v3.7 for sure


yum install libsndfile

补充说明:如果使用CentOS 7.8并做了yum update操作后,

会出现运行文件报错,要做下面几处地方修改:

a.修改DB_wav_reader.py文件
import sys
from glob import glob

import librosa
import numpy as np
import pandas as pd

from configure import SAMPLE_RATE

np.set_printoptions(threshold=sys.maxsize)--重点

 

猜你喜欢

转载自blog.csdn.net/penker_zhao/article/details/107757419
今日推荐