【Voice conversion: How to calculate MCD objectively? 】

Calculate the MCD value

Written in front: Thanks to github author Lukelluke , more detailed references can be clicked: Lukelluke

  1. Prepare mcd and merlin-master

  2. Prepare source speech and transcribed speech. Create two folders to store the original voice and the transcribed voice respectively. There must be a one-to-one correspondence between the source voice and the transcribed voice, and the file names must be the same, otherwise it cannot be calculated.

    mkdir org
    mkdir convert
    
  3. Get mgc, bap, lf0 files.

     cd merlin-master/egs/voice_conversion/s1/
     ./01_setup.sh sperakera speakerb
    

    speakera and speakerb will be built under the database folder, copy the source voice file in org and the converted voice in convert to speakera and speakerb respectively , and then execute the following command:

    ./02_....sh database/sperakera database/sperakera_extract
    ./02_....sh database/sperakerb database/sperakerb_extract
    

    The .mgc, .bap, and .lf0 files will be extracted into speakera_extract and speakera_extract respectively.
    After the extraction is complete:
    (1) Copy the .mgc file of the source voice (that is, under the speakera_extract folder) to mcd/test_data/ref-examples, (2) Copy the transcribed voice (that is,
    under the speakera_extract folder) .mgc Copy the three types of files .bap .lf0 to mcd/test_data/synth-examples

  4. Calculate the MCD
    and write the corresponding file names of all source voices and transcribed voices into mcd/test_data/corpus.lst. Then execute the command:

    cat test_data/corpus.lst | xargs bin/dtw_synth test_data/ref-examples test_data/synth-examples out
    

    can be calculated

corpus.lst file reference example:

	p229_p362_081
	p260_p343_386

Only the file name without suffix, and the source file and the transcribed voice file name are guaranteed to be the same

Note: If an error is reported, you can try to modify the vecseq of import htk_io.vecseq as vsio in mcd/bin/dtw_synth. ctrl click to modify

def readFile(self, vecSeqFile):
    """Reads a raw vector sequence file.
    
    The dtype of the returned numpy array is always the numpy default
    np.float, which may be 32-bit or 64-bit depending on architecture, etc.
    """
    Vec = np.fromfile(vecSeqFile, dtype=self.dtypeFile)
    lengthOfVec = len(Vec)
    misLenToPad = lengthOfVec % self.vecSize
    means = np.mean(Vec)

    for i in range(misLenToPad):
        Vec = np.insert(Vec, lengthOfVec, means)

    return np.reshape(
        Vec,
        (-1, self.vecSize)
    ).astype(np.float)

    # return np.reshape(
    #     np.fromfile(vecSeqFile, dtype=self.dtypeFile),
    #     (-1, self.vecSize)
    # ).astype(np.float)

According to the converted file, copy multiple corresponding source files so that the file names correspond to

# python2
def mycopy3():
    org_path = "/mnt/hgfs/VmwareShare/mcd/org"
    opt4_path = "/mnt/hgfs/VmwareShare/mcd/test"
    opt4_outpath = "/mnt/hgfs/VmwareShare/mcd/test_output"

    for wav in os.listdir(org_path):
        name1 = wav
        # print name1
        for con_name in os.listdir(opt4_path):
            name2 = con_name.split('_')
            print name2
            name3 = name2[1].strip("C") + "_" + name2[2] + ".wav"
            print name3
            if name3 == name1:
                shutil.copy(os.path.join(org_path, name1), os.path.join(opt4_outpath, con_name))

List filenames without .wav suffix

# python2
def list_filename2():
    org_path = "/home/ubuntu/Downloads/merlin-master/egs/voice_conversion/s1/database/speakerb"
    for filename in os.listdir(org_path):
        print filename.strip(".wav")

Guess you like

Origin blog.csdn.net/weixin_42538665/article/details/127749829