Montreal Forced Aligner (MFA) [1] is a tool for aligning audio and text. It can be used in fields such as speech recognition, speech synthesis and pronunciation research. MFA supports multiple languages and voices, and users can customize training models as needed.
This blog describes how to use MFA to align audio and text, which uses the latest version of MFA (version v2.2.12).
Table of contents
MFA installation
MFA supports Windows, macOS, and Linux operating systems.
This blog is based on the Linux operating system (Ubuntu20.04) and other references to MFA installation .
Installation method one:
conda create -n aligner -c conda-forge montreal-forced-aligner
conda activate aligner
conda update --all
conda install -c conda-forge montreal-forced-aligner
pip install g2pk
Installation method two:
git clone https://github.com/pyrasis/MFARunner
conda create -n mfa -c conda-forge montreal-forced-aligner
source activate
conda activate mfa
conda install montreal-forced-aligner==2.0.6
cd MFARunner
pip install -r requirements.txt
sudo apt-get install g++ openjdk-8-jdk python3-dev python3-pip curl
pip install konlpy==0.6.0 ffmpeg==1.4
bash <(curl -s https://raw.githubusercontent.com/konlpy/konlpy/master/scripts/mecab.sh)
Modify the dataset path address in config.py
*It should be noted here that voice data and txt files cannot be placed in the same folder.
Then run main.py to generate the dictionary.
python main.py
Generate dictionary file
1. Download the pre-trained model
Contains: dictionaries, G2P() models, acoustic models,
- Generate dictionaries using pretrained grapheme-to-phoneme (G2P) models .
#English
mfa model download g2p english_uk_mfa
#Chinese
mfa model download g2p mandarin_pinyin_g2p
# 也可到官网上直接下载
#Korean
mfa model download g2p korean_jamo_mfa
2.1. Example dataset Chinese
mfa g2p mandarin_pinyin_g2p 数据集路径/dataset 保存路径/mandarin_dict.txt
The .wav file is the voice file in the data set, and the .lab file is the text corresponding to the voice file. The language of the text should be consistent with the text recognized by the acoustic model you use and the language in the dictionary file. For example: the text recognized by the acoustic model is Chinese characters, but your language file says "I love you", then the text "I love you" should be stored in your .lab file, if it is the text recognized by the acoustic model If it is pinyin, the .lab file should be "wo3 ai4 ni3", and the dictionary file should also have the corresponding information of Chinese characters-phonemes or pinyin-phonemes.
2.2. Example Korean dataset Korean Single Speaker Speech Dataset | Kaggle
Create a new kss-align.py to generate a .lab file
import os, tqdm, re
from tqdm import tqdm
from jamo import h2j
from glob import glob
text = '/workspace/dataset/kss/transcript.v.1.4.txt'
base_dir = '/workspace/dataset/kss'
filters = '([.,!?])'
with open(text, 'r', encoding='utf-8') as f:
for line in f.readlines():
temp = line.split('|')
file_dir, script = temp[0], temp[3]
script = re.sub(re.compile(filters), '', script)
file_dir = file_dir.split('/')
fn = file_dir[0] + '/' + file_dir[1][:-3] + 'lab'
file_dir = os.path.join(base_dir, fn)
with open(file_dir, 'w', encoding='utf-8') as f:
f.write(script)
file_list = sorted(glob(os.path.join(base_dir, '**/*.lab')))
jamo_dict = {}
for file_name in tqdm(file_list):
sentence = open(file_name, 'r', encoding='utf-8').readline()
jamo = h2j(sentence).split(' ')
for i, s in enumerate(jamo):
if s not in jamo_dict:
jamo_dict[s] = ' '.join(jamo[i])
dict_name = 'korean_dict.txt'
with open(dict_name, 'w', encoding='utf-8') as f:
for key in jamo_dict.keys():
content = '{}\t{}\n'.format(key, jamo_dict[key])
f.write(content)
#pip install jamo
#生成.lab文件
python kss-align.py
Dictionary (lexicon) file generation
mfa train_g2p korean_dict.txt korean.zip
mfa g2p korean.zip kss korean.txt
mfa train kss korean.txt out
command explanation
Parameters input to each command can be changed according to the user.
- mfa train_g2p is the path of korean_dict, the path of the zip file generated after running
- mfag2p path to zip file from train_g2p, path to folder with data, path to txt file generated after running
- The mfa train receives the folder path with data, the txt file path that appears in g2p, and the path to save the TextGrid file after running
Once all run, the TextGrid file will be saved to the out folder
align
mfa align /path:/dataset path:/dictionary path:/acoustic_modle path:/output
After executing mfa, a file named unaligned.txt may appear.
PS
[PS1]the global mfa database server does not exist, initializing it first .
The global mfa database server does not exist, please initialize it first.
montreal_forced_aligner.exceptions.DatabaseError: DatabaseError:
There was an error encountered starting the global MFA database server, please see /root/Documents/MFA/pg_init_log_global.txt for more details and/or look at the logged errors above.
Reference URL [unresolved]
mfa configure --enable_auto_server
mfa server init
[PS2]yaml.constructor.ConstructorError: could not determine a constructor for the tag 'tag:yaml.org,2002:python/object:argparse.Namespace'
try
pip install pyyaml==4.2b2
Errors still occur after reinstalling the yaml version
modify file
vim /opt/conda/envs/mfa/lib/python3.8/site-packages/montreal_forced_aligner/config.py
Will
config = yaml.safe_load(file_data)
changed to
config = yaml.unsafe_load(file_data)
After that it works normally.
Questions and Answers (Q&A)
1. In the MFA command, what is the difference between mfa train and mfa train_g2p?
mfa train: train a new acoustic model
mfa train_g2p : Train a Chinese character diacritical phoneme model
References
【1】PYRASIS.COM: Make your voice into TTS (FastSpeech2)
【2】Installation — Montreal Forced Aligner 2.0.0 documentation