Artificial intelligence AI technology has already penetrated into every corner of people's lives. Don't you see the singing of AI Stefanie Sun one after another, but not everyone has an N card. Life without GPU is always difficult, but it doesn't matter, Shanren There is a clever plan. This time we will build a deep learning environment based on Google's Colab free cloud server, make AI Trump, and let him sing "The Internationale".
Colab (full name Colaboratory), it is a basic free server product based on the cloud of Google, which can write and execute Python code on the B side, that is, the browser, which is very convenient. What's more, Colab can assign users Free GPU to use, for friends without N card, this has gone far beyond the scope of the conscience of the industry, it is simply doing charity.
Configure Colab
Colab is a product based on Google cloud disk. We can directly store data such as deep learning Python scripts, trained models, and training sets in the cloud disk, and then execute it through Colab.
First visit Google Cloud Disk: drive.google.com
Then click New and choose to associate more applications:
Then install Colab:
So far, the cloud disk and Colab have been associated. Now we can create a new script file my_sovits.ipynb and type the code:
hello colab
Then, press the shortcut key ctrl + enter to run the code:
It should be noted here that Colab uses Python code in ipynb format based on Jupyter Notebook.
Jupyter Notebook is opened in the form of a web page, and you can directly write and run code on the web page, and the running result of the code will also be displayed directly under the code block. If you need to write an instruction document during the programming process, you can write it directly on the same page, which is convenient for timely explanation and explanation.
Then set the graphics card type:
Then run the command to check the GPU version:
!/usr/local/cuda/bin/nvcc --version
!nvidia-smi
The program returns:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
Tue May 16 04:49:23 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12 Driver Version: 525.85.12 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
| N/A 65C P8 13W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Here it is recommended to choose the graphics card type of Tesla T4, which has more outstanding performance.
So far Colab is configured.
Configure So-vits
Next, we configure the so-vits environment, and we can install some basic dependencies through the pip command:
!pip install pyworld==0.3.2
!pip install numpy==1.23.5
Note that the jupyter language uses exclamation points to run commands.
Note that because it is not a local environment, sometimes colab will remind:
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting numpy==1.23.5
Downloading numpy-1.23.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17.1/17.1 MB 80.1 MB/s eta 0:00:00
Installing collected packages: numpy
Attempting uninstall: numpy
Found existing installation: numpy 1.22.4
Uninstalling numpy-1.22.4:
Successfully uninstalled numpy-1.22.4
Successfully installed numpy-1.23.5
WARNING: The following packages were previously imported in this runtime:
[numpy]
You must restart the runtime in order to use newly installed versions.
At this time, the numpy library needs to restart the runtime before it can be imported.
After restarting the runtime, you need to reinstall it again until the system prompts that the dependencies already exist:
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Requirement already satisfied: numpy==1.23.5 in /usr/local/lib/python3.10/dist-packages (1.23.5)
Then, clone the so-vits project and install the project's dependencies:
import os
import glob
!git clone https://github.com/effusiveperiscope/so-vits-svc -b eff-4.0
os.chdir('/content/so-vits-svc')
# install requirements one-at-a-time to ignore exceptions
!cat requirements.txt | xargs -n 1 pip install --extra-index-url https://download.pytorch.org/whl/cu117
!pip install praat-parselmouth
!pip install ipywidgets
!pip install huggingface_hub
!pip install pip==23.0.1 # fix pip version for fairseq install
!pip install fairseq==0.12.2
!jupyter nbextension enable --py widgetsnbextension
existing_files = glob.glob('/content/**/*.*', recursive=True)
!pip install --upgrade protobuf==3.9.2
!pip uninstall -y tensorflow
!pip install tensorflow==2.11.0
After installing the dependencies, define some pre-tool methods:
os.chdir('/content/so-vits-svc') # force working-directory to so-vits-svc - this line is just for safety and is probably not required
import tarfile
import os
from zipfile import ZipFile
# taken from https://github.com/CookiePPP/cookietts/blob/master/CookieTTS/utils/dataset/extract_unknown.py
def extract(path):
if path.endswith(".zip"):
with ZipFile(path, 'r') as zipObj:
zipObj.extractall(os.path.split(path)[0])
elif path.endswith(".tar.bz2"):
tar = tarfile.open(path, "r:bz2")
tar.extractall(os.path.split(path)[0])
tar.close()
elif path.endswith(".tar.gz"):
tar = tarfile.open(path, "r:gz")
tar.extractall(os.path.split(path)[0])
tar.close()
elif path.endswith(".tar"):
tar = tarfile.open(path, "r:")
tar.extractall(os.path.split(path)[0])
tar.close()
elif path.endswith(".7z"):
import py7zr
archive = py7zr.SevenZipFile(path, mode='r')
archive.extractall(path=os.path.split(path)[0])
archive.close()
else:
raise NotImplementedError(f"{path} extension not implemented.")
# taken from https://github.com/CookiePPP/cookietts/tree/master/CookieTTS/_0_download/scripts
# megatools download urls
win64_url = "https://megatools.megous.com/builds/builds/megatools-1.11.1.20230212-win64.zip"
win32_url = "https://megatools.megous.com/builds/builds/megatools-1.11.1.20230212-win32.zip"
linux_url = "https://megatools.megous.com/builds/builds/megatools-1.11.1.20230212-linux-x86_64.tar.gz"
# download megatools
from sys import platform
import os
import urllib.request
import subprocess
from time import sleep
if platform == "linux" or platform == "linux2":
dl_url = linux_url
elif platform == "darwin":
raise NotImplementedError('MacOS not supported.')
elif platform == "win32":
dl_url = win64_url
else:
raise NotImplementedError ('Unknown Operating System.')
dlname = dl_url.split("/")[-1]
if dlname.endswith(".zip"):
binary_folder = dlname[:-4] # remove .zip
elif dlname.endswith(".tar.gz"):
binary_folder = dlname[:-7] # remove .tar.gz
else:
raise NameError('downloaded megatools has unknown archive file extension!')
if not os.path.exists(binary_folder):
print('"megatools" not found. Downloading...')
if not os.path.exists(dlname):
urllib.request.urlretrieve(dl_url, dlname)
assert os.path.exists(dlname), 'failed to download.'
extract(dlname)
sleep(0.10)
os.unlink(dlname)
print("Done!")
binary_folder = os.path.abspath(binary_folder)
def megadown(download_link, filename='.', verbose=False):
"""Use megatools binary executable to download files and folders from MEGA.nz ."""
filename = ' --path "'+os.path.abspath(filename)+'"' if filename else ""
wd_old = os.getcwd()
os.chdir(binary_folder)
try:
if platform == "linux" or platform == "linux2":
subprocess.call(f'./megatools dl{filename}{" --debug http" if verbose else ""} {download_link}', shell=True)
elif platform == "win32":
subprocess.call(f'megatools.exe dl{filename}{" --debug http" if verbose else ""} {download_link}', shell=True)
except:
os.chdir(wd_old) # don't let user stop download without going back to correct directory first
raise
os.chdir(wd_old)
return filename
import urllib.request
from tqdm import tqdm
import gdown
from os.path import exists
def request_url_with_progress_bar(url, filename):
class DownloadProgressBar(tqdm):
def update_to(self, b=1, bsize=1, tsize=None):
if tsize is not None:
self.total = tsize
self.update(b * bsize - self.n)
def download_url(url, filename):
with DownloadProgressBar(unit='B', unit_scale=True,
miniters=1, desc=url.split('/')[-1]) as t:
filename, headers = urllib.request.urlretrieve(url, filename=filename, reporthook=t.update_to)
print("Downloaded to "+filename)
download_url(url, filename)
def download(urls, dataset='', filenames=None, force_dl=False, username='', password='', auth_needed=False):
assert filenames is None or len(urls) == len(filenames), f"number of urls does not match filenames. Expected {len(filenames)} urls, containing the files listed below.\n{filenames}"
assert not auth_needed or (len(username) and len(password)), f"username and password needed for {dataset} Dataset"
if filenames is None:
filenames = [None,]*len(urls)
for i, (url, filename) in enumerate(zip(urls, filenames)):
print(f"Downloading File from {url}")
#if filename is None:
# filename = url.split("/")[-1]
if filename and (not force_dl) and exists(filename):
print(f"{filename} Already Exists, Skipping.")
continue
if 'drive.google.com' in url:
assert 'https://drive.google.com/uc?id=' in url, 'Google Drive links should follow the format "https://drive.google.com/uc?id=1eQAnaoDBGQZldPVk-nzgYzRbcPSmnpv6".\nWhere id=XXXXXXXXXXXXXXXXX is the Google Drive Share ID.'
gdown.download(url, filename, quiet=False)
elif 'mega.nz' in url:
megadown(url, filename)
else:
#urllib.request.urlretrieve(url, filename=filename) # no progress bar
request_url_with_progress_bar(url, filename) # with progress bar
import huggingface_hub
import os
import shutil
class HFModels:
def __init__(self, repo = "therealvul/so-vits-svc-4.0",
model_dir = "hf_vul_models"):
self.model_repo = huggingface_hub.Repository(local_dir=model_dir,
clone_from=repo, skip_lfs_files=True)
self.repo = repo
self.model_dir = model_dir
self.model_folders = os.listdir(model_dir)
self.model_folders.remove('.git')
self.model_folders.remove('.gitattributes')
def list_models(self):
return self.model_folders
# Downloads model;
# copies config to target_dir and moves model to target_dir
def download_model(self, model_name, target_dir):
if not model_name in self.model_folders:
raise Exception(model_name + " not found")
model_dir = self.model_dir
charpath = os.path.join(model_dir,model_name)
gen_pt = next(x for x in os.listdir(charpath) if x.startswith("G_"))
cfg = next(x for x in os.listdir(charpath) if x.endswith("json"))
try:
clust = next(x for x in os.listdir(charpath) if x.endswith("pt"))
except StopIteration as e:
print("Note - no cluster model for "+model_name)
clust = None
if not os.path.exists(target_dir):
os.makedirs(target_dir, exist_ok=True)
gen_dir = huggingface_hub.hf_hub_download(repo_id = self.repo,
filename = model_name + "/" + gen_pt) # this is a symlink
if clust is not None:
clust_dir = huggingface_hub.hf_hub_download(repo_id = self.repo,
filename = model_name + "/" + clust) # this is a symlink
shutil.move(os.path.realpath(clust_dir), os.path.join(target_dir, clust))
clust_out = os.path.join(target_dir, clust)
else:
clust_out = None
shutil.copy(os.path.join(charpath,cfg),os.path.join(target_dir, cfg))
shutil.move(os.path.realpath(gen_dir), os.path.join(target_dir, gen_pt))
return {"config_path": os.path.join(target_dir,cfg),
"generator_path": os.path.join(target_dir,gen_pt),
"cluster_path": clust_out}
# Example usage
# vul_models = HFModels()
# print(vul_models.list_models())
# print("Applejack (singing)" in vul_models.list_models())
# vul_models.download_model("Applejack (singing)","models/Applejack (singing)")
print("Finished!")
These methods help us download, decompress and load models.
Timbre model download and online reasoning
Then download Trump's tone model and configuration file, the download address is:
https://huggingface.co/Nardicality/so-vits-svc-4.0-models/tree/main/Trump18.5k
Then the model file is placed in the models folder of the project, and the configuration file is placed in the config folder.
Then upload the songs to be converted to a directory parallel to the project.
Run the code:
import os
import glob
import json
import copy
import logging
import io
from ipywidgets import widgets
from pathlib import Path
from IPython.display import Audio, display
os.chdir('/content/so-vits-svc')
import torch
from inference import infer_tool
from inference import slicer
from inference.infer_tool import Svc
import soundfile
import numpy as np
MODELS_DIR = "models"
def get_speakers():
speakers = []
for _,dirs,_ in os.walk(MODELS_DIR):
for folder in dirs:
cur_speaker = {}
# Look for G_****.pth
g = glob.glob(os.path.join(MODELS_DIR,folder,'G_*.pth'))
if not len(g):
print("Skipping "+folder+", no G_*.pth")
continue
cur_speaker["model_path"] = g[0]
cur_speaker["model_folder"] = folder
# Look for *.pt (clustering model)
clst = glob.glob(os.path.join(MODELS_DIR,folder,'*.pt'))
if not len(clst):
print("Note: No clustering model found for "+folder)
cur_speaker["cluster_path"] = ""
else:
cur_speaker["cluster_path"] = clst[0]
# Look for config.json
cfg = glob.glob(os.path.join(MODELS_DIR,folder,'*.json'))
if not len(cfg):
print("Skipping "+folder+", no config json")
continue
cur_speaker["cfg_path"] = cfg[0]
with open(cur_speaker["cfg_path"]) as f:
try:
cfg_json = json.loads(f.read())
except Exception as e:
print("Malformed config json in "+folder)
for name, i in cfg_json["spk"].items():
cur_speaker["name"] = name
cur_speaker["id"] = i
if not name.startswith('.'):
speakers.append(copy.copy(cur_speaker))
return sorted(speakers, key=lambda x:x["name"].lower())
logging.getLogger('numba').setLevel(logging.WARNING)
chunks_dict = infer_tool.read_temp("inference/chunks_temp.json")
existing_files = []
slice_db = -40
wav_format = 'wav'
class InferenceGui():
def __init__(self):
self.speakers = get_speakers()
self.speaker_list = [x["name"] for x in self.speakers]
self.speaker_box = widgets.Dropdown(
options = self.speaker_list
)
display(self.speaker_box)
def convert_cb(btn):
self.convert()
def clean_cb(btn):
self.clean()
self.convert_btn = widgets.Button(description="Convert")
self.convert_btn.on_click(convert_cb)
self.clean_btn = widgets.Button(description="Delete all audio files")
self.clean_btn.on_click(clean_cb)
self.trans_tx = widgets.IntText(value=0, description='Transpose')
self.cluster_ratio_tx = widgets.FloatText(value=0.0,
description='Clustering Ratio')
self.noise_scale_tx = widgets.FloatText(value=0.4,
description='Noise Scale')
self.auto_pitch_ck = widgets.Checkbox(value=False, description=
'Auto pitch f0 (do not use for singing)')
display(self.trans_tx)
display(self.cluster_ratio_tx)
display(self.noise_scale_tx)
display(self.auto_pitch_ck)
display(self.convert_btn)
display(self.clean_btn)
def convert(self):
trans = int(self.trans_tx.value)
speaker = next(x for x in self.speakers if x["name"] ==
self.speaker_box.value)
spkpth2 = os.path.join(os.getcwd(),speaker["model_path"])
print(spkpth2)
print(os.path.exists(spkpth2))
svc_model = Svc(speaker["model_path"], speaker["cfg_path"],
cluster_model_path=speaker["cluster_path"])
input_filepaths = [f for f in glob.glob('/content/**/*.*', recursive=True)
if f not in existing_files and
any(f.endswith(ex) for ex in ['.wav','.flac','.mp3','.ogg','.opus'])]
for name in input_filepaths:
print("Converting "+os.path.split(name)[-1])
infer_tool.format_wav(name)
wav_path = str(Path(name).with_suffix('.wav'))
wav_name = Path(name).stem
chunks = slicer.cut(wav_path, db_thresh=slice_db)
audio_data, audio_sr = slicer.chunks2audio(wav_path, chunks)
audio = []
for (slice_tag, data) in audio_data:
print(f'#=====segment start, '
f'{round(len(data)/audio_sr, 3)}s======')
length = int(np.ceil(len(data) / audio_sr *
svc_model.target_sample))
if slice_tag:
print('jump empty segment')
_audio = np.zeros(length)
else:
# Padding "fix" for noise
pad_len = int(audio_sr * 0.5)
data = np.concatenate([np.zeros([pad_len]),
data, np.zeros([pad_len])])
raw_path = io.BytesIO()
soundfile.write(raw_path, data, audio_sr, format="wav")
raw_path.seek(0)
_cluster_ratio = 0.0
if speaker["cluster_path"] != "":
_cluster_ratio = float(self.cluster_ratio_tx.value)
out_audio, out_sr = svc_model.infer(
speaker["name"], trans, raw_path,
cluster_infer_ratio = _cluster_ratio,
auto_predict_f0 = bool(self.auto_pitch_ck.value),
noice_scale = float(self.noise_scale_tx.value))
_audio = out_audio.cpu().numpy()
pad_len = int(svc_model.target_sample * 0.5)
_audio = _audio[pad_len:-pad_len]
audio.extend(list(infer_tool.pad_array(_audio, length)))
res_path = os.path.join('/content/',
f'{wav_name}_{trans}_key_'
f'{speaker["name"]}.{wav_format}')
soundfile.write(res_path, audio, svc_model.target_sample,
format=wav_format)
display(Audio(res_path, autoplay=True)) # display audio file
pass
def clean(self):
input_filepaths = [f for f in glob.glob('/content/**/*.*', recursive=True)
if f not in existing_files and
any(f.endswith(ex) for ex in ['.wav','.flac','.mp3','.ogg','.opus'])]
for f in input_filepaths:
os.remove(f)
inference_gui = InferenceGui()
At this time, the system will automatically search for music files in the root directory, that is, content, including but not limited to wav, flac, mp3, etc., and then perform inference based on the downloaded model. Before inference, the background sound separation and noise reduction will be automatically performed on the file and slicing operations.
After the reasoning is over, the converted song will be played automatically.
epilogue
If you are just starting to use Colab, the default allocated video memory is about 15G, which is fully capable of most training and inference tasks. However, if you often use it to perform on-hook operations, the allocated graphics card configuration will gradually decrease. If it takes a long time And the relatively stable GPU resources still require a paid subscription to the Colab pro service. In addition, the free space of the Google cloud disk is also 15G. If you download too many models, the cloud disk space will be insufficient, and the code will report an error. Therefore, it is best to clean up Google regularly. Cloud disk to ensure the normal operation of deep learning tasks.