Practical tools | Speech-to-text alignment MFA installation and use

 Montreal Forced Aligner (MFA) [1] is a tool for aligning audio and text. It can be used in fields such as speech recognition, speech synthesis and pronunciation research. MFA supports multiple languages ​​and voices, and users can customize training models as needed.

This blog describes how to use MFA to align audio and text, which uses the latest version of MFA (version v2.2.12).

Table of contents

MFA installation

Installation method one:

Installation method two:

Generate dictionary file

align

PS


MFA installation

MFA supports Windows, macOS, and Linux operating systems.

This blog is based on the Linux operating system (Ubuntu20.04) and other references to MFA installation .

Installation method one:

conda create -n aligner -c conda-forge montreal-forced-aligner
conda activate aligner
conda update --all
conda install -c conda-forge montreal-forced-aligner



pip install g2pk

Installation method two:

 git clone https://github.com/pyrasis/MFARunner

conda create -n mfa -c conda-forge montreal-forced-aligner
source activate
conda activate mfa

conda install montreal-forced-aligner==2.0.6
cd MFARunner
pip install -r requirements.txt
sudo apt-get install g++ openjdk-8-jdk python3-dev python3-pip curl
pip install konlpy==0.6.0 ffmpeg==1.4
bash <(curl -s https://raw.githubusercontent.com/konlpy/konlpy/master/scripts/mecab.sh)



 Modify the dataset path address in config.py

*It should be noted here that voice data and txt files cannot be placed in the same folder.

Then run main.py to generate the dictionary.

python main.py

Generate dictionary file

1. Download the pre-trained model

Contains: dictionaries, G2P() models, acoustic models,

Download the g2p model

  • Generate dictionaries using pretrained grapheme-to-phoneme (G2P) models .

#English

mfa model download g2p english_uk_mfa

#Chinese

mfa model download g2p mandarin_pinyin_g2p
# 也可到官网上直接下载

 #Korean

mfa model download g2p korean_jamo_mfa

 

2.1. Example dataset Chinese

mfa g2p mandarin_pinyin_g2p 数据集路径/dataset 保存路径/mandarin_dict.txt

The .wav file is the voice file in the data set, and the .lab file is the text corresponding to the voice file. The language of the text should be consistent with the text recognized by the acoustic model you use and the language in the dictionary file. For example: the text recognized by the acoustic model is Chinese characters, but your language file says "I love you", then the text "I love you" should be stored in your .lab file, if it is the text recognized by the acoustic model If it is pinyin, the .lab file should be "wo3 ai4 ni3", and the dictionary file should also have the corresponding information of Chinese characters-phonemes or pinyin-phonemes.

2.2. Example Korean dataset Korean Single Speaker Speech Dataset | Kaggle

 Create a new kss-align.py to generate a .lab file

import os, tqdm, re
from tqdm import tqdm
from jamo import h2j
from glob import glob

text = '/workspace/dataset/kss/transcript.v.1.4.txt'
base_dir = '/workspace/dataset/kss'

filters = '([.,!?])'

with open(text, 'r', encoding='utf-8') as f:
    for line in f.readlines():
        temp = line.split('|')
        file_dir, script = temp[0], temp[3]
        script = re.sub(re.compile(filters), '', script)
        file_dir = file_dir.split('/')
        fn = file_dir[0] + '/' + file_dir[1][:-3] + 'lab'
        file_dir = os.path.join(base_dir, fn)
        with open(file_dir, 'w', encoding='utf-8') as f:
            f.write(script)

file_list = sorted(glob(os.path.join(base_dir, '**/*.lab')))
jamo_dict = {}
for file_name in tqdm(file_list):
    sentence =  open(file_name, 'r', encoding='utf-8').readline()
    jamo = h2j(sentence).split(' ')
    
    for i, s in enumerate(jamo):
        if s not in jamo_dict:
            jamo_dict[s] = ' '.join(jamo[i])        

dict_name = 'korean_dict.txt'
with open(dict_name, 'w', encoding='utf-8') as f:
    for key in jamo_dict.keys():
        content = '{}\t{}\n'.format(key, jamo_dict[key])
        f.write(content)
#pip install jamo

#生成.lab文件
python kss-align.py

 

Dictionary (lexicon) file generation

mfa train_g2p korean_dict.txt korean.zip
mfa g2p korean.zip kss korean.txt
mfa train kss korean.txt out

command explanation

Parameters input to each command can be changed according to the user.

  • mfa train_g2p is the path of korean_dict, the path of the zip file generated after running
  • mfag2p path to zip file from train_g2p, path to folder with data, path to txt file generated after running
  • The mfa train receives the folder path with data, the txt file path that appears in g2p, and the path to save the TextGrid file after running

Once all run, the TextGrid file will be saved to the out folder

align

mfa align /path:/dataset path:/dictionary path:/acoustic_modle path:/output


After executing mfa, a file named unaligned.txt may appear.

PS

[PS1]the global mfa database server does not exist, initializing it first . 

The global mfa database server does not exist, please initialize it first.

montreal_forced_aligner.exceptions.DatabaseError: DatabaseError:

There was an error encountered starting the global MFA database server, please see /root/Documents/MFA/pg_init_log_global.txt for more details and/or look at the logged errors above.

 Reference URL [unresolved]

mfa configure --enable_auto_server
mfa server init

 [PS2]yaml.constructor.ConstructorError: could not determine a constructor for the tag 'tag:yaml.org,2002:python/object:argparse.Namespace'

 try

pip install pyyaml==4.2b2

Errors still occur after reinstalling the yaml version

modify file

vim /opt/conda/envs/mfa/lib/python3.8/site-packages/montreal_forced_aligner/config.py

Will

config = yaml.safe_load(file_data)

changed to

config = yaml.unsafe_load(file_data)

 After that it works normally.

Questions and Answers (Q&A)

1. In the MFA command, what is the difference between mfa train and mfa train_g2p?

mfa train: train a new acoustic model

mfa train_g2p : Train a Chinese character diacritical phoneme model

References

【1】PYRASIS.COM: Make your voice into TTS (FastSpeech2)

【2】Installation — Montreal Forced Aligner 2.0.0 documentation 

【3】https://osakuadeopeyemi.medium.com/generate-forced-alignment-with-montreal-forced-aligner-mfa-383f91a6f2a1 

Guess you like

Origin blog.csdn.net/weixin_44649780/article/details/131041611