playvoice / so-vits-svc-5.0 Goto Github PK
View Code? Open in Web Editor NEWCore Engine of Singing Voice Conversion & Singing Voice Clone
Home Page: https://huggingface.co/spaces/maxmax20160403/sovits5.0
License: MIT License
Core Engine of Singing Voice Conversion & Singing Voice Clone
Home Page: https://huggingface.co/spaces/maxmax20160403/sovits5.0
License: MIT License
from whisper.model import Whisper, ModelDimensions
ModuleNotFoundError: No module named 'whisper'
How to clear training checkpoint automatically in bigvgan?
I know how to change in main.
preprocess_zzz.py calls vits but preprocess_zzz.py itself is in a map so can't locate it.
if I copy the vits map into prepare it still throws an error at step 7:
line 252, in iter
ids_bucket = ids_bucket + ids_bucket * (rem // len_bucket) + ids_bucket[:(rem % len_bucket)]
ZeroDivisionError: integer division or modulo by zero in \vits\data_utils.py
is it because vits_pretrain.pt is missing? it does not seem to exist on the internet?
运行代码 重采样
将音频剪裁为小于30秒的音频段,whisper的要求
生成采样率16000Hz音频, 存储路径为:./data_svc/waves-16k
python prepare/preprocess_a.py -w ./data_raw -o ./data_svc/waves-16k -s 16000
后,出现 [WinError 3] 系统找不到指定的路径。
但翻看文件夹时,对应文件已存在
再次运行代码 [WinError 183] 当文件已存在时,无法创建该文件。
但是文件夹中并没有采样率16000HZ的音频段,一片空白
https://colab.research.google.com/drive/1PY1E4bDAeHbAD4r99D_oYXB46fG8nIA5?usp=sharing
如题,在colab中推理一首5min的歌曲也是会爆显存的(15g)
找了一下config 没看见哪个量控制这个。可能是我看漏了qwq
添加中文文档
有QQ群没
./raw/test.wav
test.ppg.npy
./raw/test.wav
test.csv
don't use pitch shift
/root/autodl-tmp/so-vits-svc5/diff_tool
No diffusion model or config found. Shallow diffusion mode will False
Traceback (most recent call last):
File "inference_main.py", line 121, in
main()
File "inference_main.py", line 83, in main
svc_model = Svc(args.model_path, args.config_path, args.device, args.cluster_model_path,enhance,diffusion_model_path,diffusion_config_path,shallow_diffusion,only_diffusion)
File "/root/autodl-tmp/so-vits-svc5/diff_tool/inference/infer_tool.py", line 159, in init
self.load_model()
File "/root/autodl-tmp/so-vits-svc5/diff_tool/inference/infer_tool.py", line 176, in load_model
self.hps_ms.data.filter_length // 2 + 1,
AttributeError: 'Svc' object has no attribute 'hps_ms'
/root/autodl-tmp/so-vits-svc5
mv: cannot stat './diff_tool/results/*': No such file or directory
推理结束
Hello! Is this repository support another train sample rate 44100?
How can I create the checkpoints for each dataset of speakers separately in different folders?
你好,我在本地使用预览模型进行测试时发现没有变声效果,但在hugging face上用同样的音频测试,效果却不一样,请问下我是哪个过程错了呢?
我是按照下面的步骤进行的:
python svc_inference.py --config configs/base.yaml --model sovits5.0.pth --spk ./configs/singers/singer0051.npy --wave test.wav --ppg test.ppg.npy --pit test.csv
(venv) PS D:\so-vits-svc-5.0> python prepare/preprocess_ppg.py -w data_svc/waves-16k/ -p data_svc/whisper
Traceback (most recent call last):
File "D:\so-vits-svc-5.0\prepare\preprocess_ppg.py", line 6, in
from whisper.model import Whisper, ModelDimensions
ModuleNotFoundError: No module named 'whisper
i just clone this project,and install dependence,then i run those commands below
python prepare/preprocess_a.py -w ./dataset_raw -o ./data_svc/waves-16k -s 16000
python prepare/preprocess_f0.py -w data_svc/waves-16k/ -p data_svc/pitch
when i run "python prepare/preprocess_ppg.py -w data_svc/waves-16k/ -p"
occur hapend,
windows 10
Python 3.10.7
help :(
Just curious
请问你这个模型对于数据中没有出现过的人的克隆语音相似度如何呢?
README曰:
然后以下面文件结构将数据集放入dataset_raw目录
dataset_raw ├───speaker0 │ ├───xxx1-xxx1.wav │ ├───... │ └───Lxx-0xx8.wav └───speaker1 ├───xx2-0xxx2.wav ├───... └───xxx7-xxx007.wav
想问一下这里必须要使用speaker0
、speaker1
的命名方式吗?以及下面诸如Lxx-0xx8
等名称中是否要严格包含这些数字(以及大写L)?
I found there are 52 singer.
All of them is in chinese ?
Or they have different language
how to get results on my own dataset?
设置工作目录可以加一个Windows的方法,readme里面的好像只适用Linux()
set PYTHONPATH=%cd%
How to fix this?
Setting up Audio Processor...
| > sample_rate:16000
| > resample:False
| > num_mels:80
| > log_func:np.log10
| > min_level_db:-100
| > frame_shift_ms:None
| > frame_length_ms:None
| > ref_level_db:20
| > fft_size:1024
| > power:1.5
| > preemphasis:0.98
| > griffin_lim_iters:60
| > signal_norm:True
| > symmetric_norm:True
| > mel_fmin:0
| > mel_fmax:8000.0
| > spec_gain:20.0
| > stft_pad_mode:reflect
| > max_norm:4.0
| > clip_norm:True
| > do_trim_silence:True
| > trim_db:60
| > do_sound_norm:False
| > do_amp_to_db_linear:True
| > do_amp_to_db_mel:True
| > stats_path:None
| > base:10
| > hop_length:256
| > win_length:1024
49%|████████████████████████████████████████████████▌ | 200/408 [00:11<00:12, 17.05it/s]/home/parisa/so-vits-svc-5.0__/speaker/utils/audio.py:732: RuntimeWarning: invalid value encountered in true_divide
return x / abs(x).max() * 0.95
49%|████████████████████████████████████████████████▌ | 200/408 [00:11<00:11, 17.49it/s]
Traceback (most recent call last):
File "prepare/preprocess_speaker.py", line 79, in
spec = speaker_encoder_ap.melspectrogram(waveform)
File "/home/parisa/so-vits-svc-5.0__/speaker/utils/audio.py", line 564, in melspectrogram
D = self.stft(self.apply_preemphasis(y))
File "/home/parisa/so-vits-svc-5.0_/speaker/utils/audio.py", line 624, in _stft
center=True,
File "/home/parisa/anaconda3/envs/voice/lib/python3.7/site-packages/librosa/util/decorators.py", line 88, in inner_f
return f(*args, **kwargs)
File "/home/parisa/anaconda3/envs/voice/lib/python3.7/site-packages/librosa/core/spectrum.py", line 202, in stft
util.valid_audio(y, mono=False)
File "/home/parisa/anaconda3/envs/voice/lib/python3.7/site-packages/librosa/util/decorators.py", line 88, in inner_f
return f(*args, **kwargs)
File "/home/parisa/anaconda3/envs/voice/lib/python3.7/site-packages/librosa/util/utils.py", line 294, in valid_audio
raise ParameterError("Audio buffer is not finite everywhere")
librosa.util.exceptions.ParameterError: Audio buffer is not finite everywhere
很强啊,第一次听说,为什么能无视BGM与和声的干扰?
Why this happened when i run this command : python svc_trainer.py -c configs/base.yaml -n sovits5.0
File "svc_trainer.py", line 11, in
from vits_extend.train import train
File "/home/parisa/so-vits-svc-5.0__/vits_extend/train.py", line 16, in
from vits_extend.writer import MyWriter
File "/home/parisa/so-vits-svc-5.0__/vits_extend/writer.py", line 1, in
from torch.utils.tensorboard import SummaryWriter
File "/home/parisa/anaconda3/envs/voice/lib/python3.7/site-packages/torch/utils/tensorboard/init.py", line 12, in
from .writer import FileWriter, SummaryWriter # noqa: F401
File "/home/parisa/anaconda3/envs/voice/lib/python3.7/site-packages/torch/utils/tensorboard/writer.py", line 9, in
from tensorboard.compat.proto.event_pb2 import SessionLog
File "/home/parisa/anaconda3/envs/voice/lib/python3.7/site-packages/tensorboard/compat/proto/event_pb2.py", line 17, in
from tensorboard.compat.proto import summary_pb2 as tensorboard_dot_compat_dot_proto_dot_summary__pb2
File "/home/parisa/anaconda3/envs/voice/lib/python3.7/site-packages/tensorboard/compat/proto/summary_pb2.py", line 17, in
from tensorboard.compat.proto import histogram_pb2 as tensorboard_dot_compat_dot_proto_dot_histogram__pb2
File "/home/parisa/anaconda3/envs/voice/lib/python3.7/site-packages/tensorboard/compat/proto/histogram_pb2.py", line 42, in
serialized_options=None, file=DESCRIPTOR),
File "/home/parisa/anaconda3/envs/voice/lib/python3.7/site-packages/google/protobuf/descriptor.py", line 561, in new
_message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates
it seems you are using whisper's encoder output directly as content information vectors, how is it better than contentvec used in previous so-vits-svc?
调用python whisper/inference.py -w test.wav -p test.ppg.npy
报错无whisper模块
pip install whisper之后报错、
Traceback (most recent call last):
File "H:\svc\sovits\so-vits-svc-5.0\whisper\inference.py", line 6, in
from whisper.model import Whisper, ModelDimensions
File "H:\svc\sovits\so-vits-svc-5.0\venv\lib\site-packages\whisper.py", line 65, in
libc = ctypes.CDLL(libc_name)
File "C:\Python310\lib\ctypes_init_.py", line 364, in init
if '/' in name or '\' in name:
TypeError: argument of type 'NoneType' is not iterable
configs/singers下的应该是debug模型的预设吧,用自己训练的模型的时候这个参数应该选哪个npy呢
what is this ?
how to solve it?
prepare/preprocess_a.py:11: RuntimeWarning: invalid value encountered in true_divide
wav = wav / np.abs(wav).max() * 0.6
Hi,
How can I use the SpeakerClassifier in vits/modules_grl.py?
It was added with this commit but I cannot see this to be used anywhere during training.
Thanks!
vits_pretrain.pt
is pretrain model only have one speaker or have mutil speaker? will us train our dataset on the pretrained first speaker?
why pretrain model is smaller than our trained model, is there a way to convert our model to pretain model?
之前有个版本推理成功过,但现在这个版本推出来只有1kB的sys_out.wav
sys_out_pit.wav有声音但都是笛笛声
使用的是 configs/singers_sample/47-wave-girl/025.wav
!python whisper/inference.py -w /content/so-vits-svc-5.0/configs/singers_sample/47-wave-girl/025.wav -p test.ppg.npy
!python svc_inference.py --config configs/base.yaml --model sovits5.0-48k-debug.pth --spk ./configs/singers/singer0023.npy --wave /content/so-vits-svc-5.0/configs/singers_sample/47-wave-girl/025.wav --ppg test.ppg.npy
在vits_pretrained文件夹中要求加入pt文件,把用vits学习的模型放进去就可以吗?
另外,应该在哪里使用vits型号呢?
如标题
LargeV2分支
how should i solve this error?
ModuleNotFoundError: No module named 'whisper.model'; 'whisper' is not a package
如题,训练好的模型如何导出音色文件
i get this error when running this : python prepare/preprocess_ppg.py -w data_svc/waves-16k/ -p data_svc/whisper
Traceback (most recent call last):
File "prepare/preprocess_ppg.py", line 56, in
whisper = load_model(os.path.join("whisper_pretrain", "medium.pt"))
File "prepare/preprocess_ppg.py", line 25, in load_model
dims = ModelDimensions(**checkpoint["dims"])
TypeError: 'ModuleSpec' object is not callable
M1 MacBook 报错见下。是不是意味着只有 NVIDIA 显卡才行
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
thanks for awesome work! since i can not understand chinese, i translated readme to english i understood traning process as below
it seems there's two stage training process, training is quite complicated, especially for stage 2 training
For first stage, train VITS(SynthesizerTrn) with whisper ppg, NSF-hifigan, external speaker encoder(d-vector)
Second stage(SynthsizerTrnEx), apply GRL, SNAC for preventing speaker information leakage in text encoder, also apply natural speech loss(bidirectional loss between prior and posterior)
is it right? also, i can not find SynthesizerTrnEx's usage in this code base(maybe currently). could you explain bit more about training process?
VIST模块找不到,也找不到下载
I installed everything that was in the file requirements.txt , but it gives me this error at stage 4
I also exported PYTHONPATH (I got to stage 7, except stage 4, with no errors)
Traceback (most recent call last):
File "prepare/preprocess_ppg.py", line 54, in
pred_ppg(whisper, f"{wavPath}/{spks}/{file}.wav", f"{ppgPath}/{spks}/{file}.ppg")
File "prepare/preprocess_ppg.py", line 20, in pred_ppg
audio = load_audio(wavPath)
File "so-vits-svc-5.0-main\whisper\audio.py", line 42, in load_audio
ffmpeg.input(file, threads=0)
File "Python\Python38\lib\site-packages\ffmpeg_run.py", line 313, in run
process = run_async(
File "Python\Python38\lib\site-packages\ffmpeg_run.py", line 284, in run_async
return subprocess.Popen(
File "Python\Python38\lib\subprocess.py", line 854, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "Python\Python38\lib\subprocess.py", line 1307, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The specified file cannot be found
what are the best values for learnign-rate , epochs, batch-size ?
how can i solve this at step4 of Data preprocessing ?
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory
how much time does it take to train ?
无论是huggingface上的demo,亦或是自己本地微调跑的结果,对于歌声以外的语音转换结果都不尽如人意,一半以上都是嘶哑的片段,目测是f0提取的问题,因为svc_out_pit.wav里没提成的部分就变成了白噪音。
尽管我个人的目的是转换游戏语音而非歌声,但对于其它想要转换歌声的人来说,能够自动处理说唱、诗喃、Intro常有的独白朗诵、live互动等无音高语音总是更方便的,隔壁的so-vits-svc-fork可以做到这一点,希望这个repo也可以有。
I am a music instructor and I would love to introduce this lovely AI software to our students to try out.
Here in my school we have several Windows 7 Pro 64-bit computers in our classrooms, running on Nvidia GeForce GTX 660M GPU. According to Nvidia, the highest version of graphic driver we can install is 425.31, and the highest CUDA Toolkit we can install would be 10.1.
According to pytorch dot org, with CUDA version 10.1, the highest torch we can install would be:
“torch-1.8.1+cu101-cp39-cp39-win_amd64.whl”.
Here, “cu101” in the file name, is referring to CUDA 10.1.
Any torch version higher than 1.8.1, will have a higher “cu” number attached in the whl file name, such as:
“torch-1.10.0+cu102-cp36-cp36m-win_amd64.whl”, or
“torch-1.13.0+cu116-cp310-cp310-win_amd64.whl”, etc.
In the non-fork so-vits-svc-4.0 program folder, there is a file called “requirements.txt”. We opened that file, and can see it says “torch==1.13.1”. Can we assume torch version 1.13.1 is the lowest minimum requirement for so-vits-svc program to run?
Too bad! My colleagues have already trained several G_43200.pth models on their home computers, and they can just simply copy these models to our school’s computers and start the voice inference right away. We don’t need to train on the classroom’s computers, we just need to infer on existing models, to demonstrate to our students. Inferring takes an awful lot less of GPU powers to do.
Has anyone tested this program on CUDA 10.1?
Please let me know. So, should I give up? Is it a death penalty for our students to see this?
What's the f0 parameter?
whisper要求为小于30秒
设置工作目录
set PYTHONPATH=%cd%
中的%cd%
指的是什么?如果是使用conda创建的虚拟环境,还需要指定PYTHONPATH
吗?应该将speaker改为timbre,才准确
指的是将python prepare/preprocess_speaker.py data_svc/waves-16k/ data_svc/speaker
更改为python prepare/preprocess_speaker.py data_svc/waves-16k/ data_svc/timbre
吗?
指定configs/base.yaml参数pretrain: "./5.0.epoch1200.full.pth",并适当调小学习率
适当指的大概是多少?
查看日志,release页面有完整的训练日志
应该在可视化的图表中参考哪些因素来判断模型是否训练完成/过拟合?
提取csv文本格式F0参数,Excel打开csv文件,对照Audition或者SonicVisualiser手动修改错误的F0
可以给出稍微详细一些的说明吗?Readme中的图片没看懂。
i don't know which file to use for --spk parameter when using my own wav file at inference step ?
最近我正在尝试将本项目移植到谷歌TPU上,但这段代码在谷歌TPU上运行速度极慢,耗费十分钟才计算完毕,推测是自定义损失函数导致其运行在CPU上,请问有没有替代方案解决该问题
vits_extends下的train.py中的
loss_kl_f = kl_loss(z_f, logs_q, m_p, logs_p, logdet_f, z_mask) * hp.train.c_kl
loss_kl_r = kl_loss(z_r, logs_p, m_q, logs_q, logdet_r, z_mask) * hp.train.c_kl
loss_g = score_loss + mel_loss + stft_loss + loss_kl_f
loss_g.backward()
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.