Calculate the MCD value
Written in front: Thanks to github author Lukelluke , more detailed references can be clicked: Lukelluke
-
Prepare mcd and merlin-master
-
Prepare source speech and transcribed speech. Create two folders to store the original voice and the transcribed voice respectively. There must be a one-to-one correspondence between the source voice and the transcribed voice, and the file names must be the same, otherwise it cannot be calculated.
mkdir org mkdir convert
-
Get mgc, bap, lf0 files.
cd merlin-master/egs/voice_conversion/s1/ ./01_setup.sh sperakera speakerb
speakera and speakerb will be built under the database folder, copy the source voice file in org and the converted voice in convert to speakera and speakerb respectively , and then execute the following command:
./02_....sh database/sperakera database/sperakera_extract ./02_....sh database/sperakerb database/sperakerb_extract
The .mgc, .bap, and .lf0 files will be extracted into speakera_extract and speakera_extract respectively.
After the extraction is complete:
(1) Copy the .mgc file of the source voice (that is, under the speakera_extract folder) to mcd/test_data/ref-examples, (2) Copy the transcribed voice (that is,
under the speakera_extract folder) .mgc Copy the three types of files .bap .lf0 to mcd/test_data/synth-examples -
Calculate the MCD
and write the corresponding file names of all source voices and transcribed voices into mcd/test_data/corpus.lst. Then execute the command:cat test_data/corpus.lst | xargs bin/dtw_synth test_data/ref-examples test_data/synth-examples out
can be calculated
corpus.lst file reference example:
p229_p362_081
p260_p343_386
Only the file name without suffix, and the source file and the transcribed voice file name are guaranteed to be the same
Note: If an error is reported, you can try to modify the vecseq of import htk_io.vecseq as vsio in mcd/bin/dtw_synth. ctrl click to modify
def readFile(self, vecSeqFile):
"""Reads a raw vector sequence file.
The dtype of the returned numpy array is always the numpy default
np.float, which may be 32-bit or 64-bit depending on architecture, etc.
"""
Vec = np.fromfile(vecSeqFile, dtype=self.dtypeFile)
lengthOfVec = len(Vec)
misLenToPad = lengthOfVec % self.vecSize
means = np.mean(Vec)
for i in range(misLenToPad):
Vec = np.insert(Vec, lengthOfVec, means)
return np.reshape(
Vec,
(-1, self.vecSize)
).astype(np.float)
# return np.reshape(
# np.fromfile(vecSeqFile, dtype=self.dtypeFile),
# (-1, self.vecSize)
# ).astype(np.float)
According to the converted file, copy multiple corresponding source files so that the file names correspond to
# python2
def mycopy3():
org_path = "/mnt/hgfs/VmwareShare/mcd/org"
opt4_path = "/mnt/hgfs/VmwareShare/mcd/test"
opt4_outpath = "/mnt/hgfs/VmwareShare/mcd/test_output"
for wav in os.listdir(org_path):
name1 = wav
# print name1
for con_name in os.listdir(opt4_path):
name2 = con_name.split('_')
print name2
name3 = name2[1].strip("C") + "_" + name2[2] + ".wav"
print name3
if name3 == name1:
shutil.copy(os.path.join(org_path, name1), os.path.join(opt4_outpath, con_name))
List filenames without .wav suffix
# python2
def list_filename2():
org_path = "/home/ubuntu/Downloads/merlin-master/egs/voice_conversion/s1/database/speakerb"
for filename in os.listdir(org_path):
print filename.strip(".wav")