Easily train a good VC model with voice data <= 10 mins!

License: MIT License

Python 93.07% Jupyter Notebook 3.32% Batchfile 2.25% Dockerfile 0.35% Shell 0.61% Go 0.39%

change sovits vits voice voice-conversion rvc audio-analysis conversational-ai conversion converter

retrieval-based-voice-conversion-webui's Introduction

Retrieval-based-Voice-Conversion-WebUI

一个基于VITS的简单易用的变声框架

更新日志 | 常见问题解答 | AutoDL·5毛钱训练AI歌手 | 对照实验记录 | 在线演示

底模使用接近50小时的开源高质量VCTK训练集训练，无版权方面的顾虑，请大家放心使用

请期待RVCv3的底模，参数更大，数据集更大，效果更好，基本持平的推理速度，需要训练数据量更少。

由于某些地区无法直连Hugging Face，即使设法成功访问，速度也十分缓慢，特推出模型/整合包/工具的一键下载器，欢迎试用：RVC-Models-Downloader

训练推理界面	实时变声界面

go-web.bat	go-realtime-gui.bat
可以自由选择想要执行的操作。	我们已经实现端到端170ms延迟。如使用ASIO输入输出设备，已能实现端到端90ms延迟，但非常依赖硬件驱动支持。

简介

本仓库具有以下特点

使用top1检索替换输入源特征为训练集特征来杜绝音色泄漏
即便在相对较差的显卡上也能快速训练
使用少量数据进行训练也能得到较好结果(推荐至少收集10分钟低底噪语音数据)
可以通过模型融合来改变音色(借助ckpt处理选项卡中的ckpt-merge)
简单易用的网页界面
可调用UVR5模型来快速分离人声和伴奏
使用最先进的人声音高提取算法InterSpeech2023-RMVPE根绝哑音问题，效果更好，运行更快，资源占用更少
A卡I卡加速支持

点此查看我们的演示视频 !

环境配置

Python 版本限制

建议使用 conda 管理 Python 环境

版本限制原因参见此bug

python --version # 3.8 <= Python < 3.11

Linux/MacOS 一键依赖安装启动脚本

执行项目根目录下run.sh即可一键配置venv虚拟环境、自动安装所需依赖并启动主程序。

sh ./run.sh

手动安装依赖

安装pytorch及其核心依赖，若已安装则跳过。参考自: https://pytorch.org/get-started/locally/
```
pip install torch torchvision torchaudio
```
如果是 win 系统 + Nvidia Ampere 架构(RTX30xx)，根据 #21 的经验，需要指定 pytorch 对应的 CUDA 版本
```
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
```
根据自己的显卡安装对应依赖

N卡
```
 pip install -r requirements.txt
```
A卡/I卡
```
 pip install -r requirements-dml.txt
```
A卡ROCM(Linux)
```
 pip install -r requirements-amd.txt
```
I卡IPEX(Linux)
```
 pip install -r requirements-ipex.txt
```

其他资源准备

1. assets

RVC需要位于assets文件夹下的一些模型资源进行推理和训练。

自动检查/下载资源(默认)

默认情况下，RVC可在主程序启动时自动检查所需资源的完整性。

即使资源不完整，程序也将继续启动。

如果您希望下载所有资源，请添加--update参数
如果您希望跳过启动时的资源完整性检查，请添加--nocheck参数

手动下载资源

所有资源文件均位于Hugging Face space

你可以在tools文件夹找到下载它们的脚本

你也可以使用模型/整合包/工具的一键下载器：RVC-Models-Downloader

以下是一份清单，包括了所有RVC所需的预模型和其他文件的名称。

./assets/hubert/hubert_base.pt

 rvcmd assets/hubert # RVC-Models-Downloader command

./assets/pretrained

 rvcmd assets/v1 # RVC-Models-Downloader command

./assets/uvr5_weights

 rvcmd assets/uvr5 # RVC-Models-Downloader command

想使用v2版本模型的话，需要额外下载

./assets/pretrained_v2

 rvcmd assets/v2 # RVC-Models-Downloader command

2. 安装 ffmpeg 工具

若已安装ffmpeg和ffprobe则可跳过此步骤。

Ubuntu/Debian 用户

sudo apt install ffmpeg

MacOS 用户

brew install ffmpeg

Windows 用户

下载后放置在根目录。

rvcmd tools/ffmpeg # RVC-Models-Downloader command

下载ffmpeg.exe
下载ffprobe.exe

3. 下载 rmvpe 人声音高提取算法所需文件

如果你想使用最新的RMVPE人声音高提取算法，则你需要下载音高提取模型参数并放置于assets/rmvpe。

下载rmvpe.pt

 rvcmd assets/rmvpe # RVC-Models-Downloader command

下载 rmvpe 的 dml 环境(可选, A卡/I卡用户)

下载rmvpe.onnx

 rvcmd assets/rmvpe # RVC-Models-Downloader command

4. AMD显卡Rocm(可选, 仅Linux)

如果你想基于AMD的Rocm技术在Linux系统上运行RVC，请先在这里安装所需的驱动。

若你使用的是Arch Linux，可以使用pacman来安装所需驱动：

pacman -S rocm-hip-sdk rocm-opencl-sdk

对于某些型号的显卡，你可能需要额外配置如下的环境变量（如：RX6700XT）：

export ROCM_PATH=/opt/rocm
export HSA_OVERRIDE_GFX_VERSION=10.3.0

同时确保你的当前用户处于render与video用户组内：

sudo usermod -aG render $USERNAME
sudo usermod -aG video $USERNAME

开始使用

直接启动

使用以下指令来启动 WebUI

python infer-web.py

Linux/MacOS 用户

./run.sh

对于需要使用IPEX技术的I卡用户(仅Linux)

source /opt/intel/oneapi/setvars.sh
./run.sh

使用整合包 (Windows 用户)

下载并解压RVC-beta.7z，解压后双击go-web.bat即可一键启动。

rvcmd packs/general/latest # RVC-Models-Downloader command

参考项目

ContentVec
VITS
HIFIGAN
Gradio
FFmpeg
Ultimate Vocal Remover
audio-slicer
Vocal pitch extraction:RMVPE
- The pretrained model is trained and tested by yxlllc and RVC-Boss.

感谢所有贡献者作出的努力

retrieval-based-voice-conversion-webui's People

Contributors

Stargazers

Watchers

Forkers

sisanime misaka-mikoto-tech maxmax2016 deyituo fumiama mortyggt dingguoli oxforevero innnky beyondchenlin yzlltyyh wistone1141 kouyma 123000001212 aki894 kaisekierror coahuilite dreemurrdango cat-stack-boop w-okada great1001 miyuki-starmiya chenxvb dotneet yantaisa11 lovemachinglearning minearchive l4ph ms300 lafi2333 zunan-islands ppcfuns techthiyanes 3kanalpha tarepan toy64bit asmedeus998 mudssky li995vv kirito5201314 nowebyone ddpn08 ponta0 abdm357 kurisu-preston realerikk0 recklessfist fischer-pixel t-sumida ishine ss-nakano isgasho if-ai oreml entropyriser treksis stealthinu leftomelas aiimot arlindacor yunwan1x eltociear hhy5277 kazuya-ito-n wakusei-meron- ntamotsu yamgen michael811125 martjay mitzzzjp kira-pgr zhanghaotian01 tiger14n shadowcz007 tps-f luke-zm tkntkn chickenham100 syaofox qwe987299 afcppe lwd-temp zdj15534309450 sapium59 taigasugishita naturalclar andrey888888 autumnmotor markusbkk katakk crowpeter xcc202 kazuki-151 sophiefy brf0915 akifqc hongwen-sun itsraindi c00renut guochan005

retrieval-based-voice-conversion-webui's Issues

执行一键包报错

Expecting value: line 1 column 1 (char 0)
提供webui截屏如下：

视频中指出说话人ID目前不需要进行改动：

Artefacting when speech has breath / Quality improvement ?

Hi, great work ! I'm so excited about your future updates. I noticed that the outputs usually don't handle well the Breaths. It creates artefacting most of the time. I was wondering if during training Breaths should be removed or kept to get the best results?
Also, would it be possible to use a high quality mode? like training and generating in 48Khz 24bits? is going over 1000 epochs also could get better and more natural results ?
The new generations of GPUs got more and more Vram, so it would be great to be able to use it at full power (for example 24gb of Vram).
Thanks for the great work!

训练完成后找不到特征检索库文件：added_IVF677_Flat_nprobe_7.index

只找到total_fea.npy，找不到added_IVF677_Flat_nprobe_7.index

Linux 下无模型加载时一直报错 FileNotFoundError: [Errno 2] No such file or directory: 'weights/[]'

OS: linux

python 执行 webui，功能正常，但会不停报错以下报文：

loading weights/[]
Traceback (most recent call last):
File "/home/vior/miniconda3/envs/RVCtrain/lib/python3.9/site-packages/gradio/routes.py", line 384, in run_predict
output = await app.get_blocks().process_api(
File "/home/vior/miniconda3/envs/RVCtrain/lib/python3.9/site-packages/gradio/blocks.py", line 1024, in process_api
result = await self.call_function(
File "/home/vior/miniconda3/envs/RVCtrain/lib/python3.9/site-packages/gradio/blocks.py", line 836, in call_function
prediction = await anyio.to_thread.run_sync(
File "/home/vior/miniconda3/envs/RVCtrain/lib/python3.9/site-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/vior/miniconda3/envs/RVCtrain/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/home/vior/miniconda3/envs/RVCtrain/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "/mnt/E/AI/Mytts/Retrieval-based-Voice-Conversion-WebUI/infer-web.py", line 167, in get_vc
cpt = torch.load(person, map_location="cpu")
File "/home/vior/miniconda3/envs/RVCtrain/lib/python3.9/site-packages/torch/serialization.py", line 791, in load
with _open_file_like(f, 'rb') as opened_file:
File "/home/vior/miniconda3/envs/RVCtrain/lib/python3.9/site-packages/torch/serialization.py", line 271, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/home/vior/miniconda3/envs/RVCtrain/lib/python3.9/site-packages/torch/serialization.py", line 252, in init
super().init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'weights/[]'

直到加载模型后解决，不知道能不能修复这个 bug

Audio does not load properly

When I started step2a with the version, there should be 1801 files in that file, but when I check the command prompt, the actual file, only 23 files are loaded.
It seems to work fine in other speakers' files.
However, only certain speakers do not seem to be working properly.
Is it just taking a long time?

How to continue the training

还有一个问题，我用colab本地的地址怎么进不去🤔，谢谢！

10分钟的音频预处理完只剩下27s

预处理前：

预处理后：

求教下，是我姿势不对吗？

附音频文件
链接: https://pan.baidu.com/s/1LkfqZbHu5FaazXm7jufh7Q
提取码: 819v

Add an option to automatically extract and save weights when saving checkpoints

Just a suggestion that i think would help a lot, to add an option that extracts and saves a weight file automatically when saving a checkpoint, ./weights/$experimentName_$steps.pth, to easily watch and read progress

提取特征时在检测过去checkpoint会出错

File "./Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", lin
e 148, in run
utils.latest_checkpoint_path(hps.model_dir, "D_*.pth"), net_d, optim_d
File "./Retrieval-based-Voice-Conversion-WebUI/train/utils.py", line 206, in latest_checkpoin
t_path
x = f_list[-1]
IndexError: list index out of range

Robotic / metallic noise on S letters

Hi, this is really a wonderful project, so far it is the one that has the best quality compared to its alternatives, incredible how well it works with small datasets (in my case 9 minutes of clean singing), and how well it recreates the voice timbre.
I only have one question, is it normal that in the S letters or breaths there is a metallic noise? is it produced by the vocoder? I have trained my model for a longer time thinking that the noise would go away, but it sounds exactly the same, either with 20 minutes of training or more than 1 hour. Do I need to do a longer training? How many epochs do you recommend?
Thanks and congratulations for the project!

报错显示找不到指定路径，但文件目录内不含空格和特殊字符

训练时报错
训练材料为语音，目的是训练语音模型
参数设置情况如下：

日志如下：

train speaker id info

What role does the speaker play on train or model interence ? how can i get speak info

RuntimeError: "slow_conv2d_cpu" not implemented for 'Half'

训练结束后报错

训练结束生成模型文件但是报错
Traceback (most recent call last): File "train_nsf_sim_cache_sid_load_pretrain.py", line 684, in <module> main() File "train_nsf_sim_cache_sid_load_pretrain.py", line 50, in main mp.spawn( File "E:\User\Voice\VoiceEnv\lib\site-packages\torch\multiprocessing\spawn.py", line 239, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "E:\User\Voice\VoiceEnv\lib\site-packages\torch\multiprocessing\spawn.py", line 197, in start_processes while not context.join(): File "E:\User\Voice\VoiceEnv\lib\site-packages\torch\multiprocessing\spawn.py", line 149, in join raise ProcessExitedException( torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with exit code 2333333

在colab上部署遇到的问题

请问在colab上部署用了群里的atri模型
https://i0.hdslb.com/bfs/note/8c93f53e8b51d8345ca619b474d4037133097c3c.png@690w_!web-note.webp
为什么在报错呀
https://i0.hdslb.com/bfs/note/e2e96ad2beaf9a304c177836aa982317ef6303c5.png@690w_!web-note.webp

Readme had the wrong command for torch installation on windows

Command was written on current readme:
pip install torch torchvision torchaudio
This is for Linux platform only
Correct command written on PyTorch website for CUDA 11.7:
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117

Doing this fixed my problem with RVC webui not detecing my RTX 3070

Bug: `mel_spectrogram_torch` crash with argument error

Summary

mel_spectrogram_torch function cause argument error.
It is because of librosa version.
I made fix PR #133.

Status

When run mel_processing.mel_spectrogram_torch function, it cause error.
Core part of the error is

--> mel = librosa_mel_fn(sampling_rate, n_fft, num_mels, fmin, fmax)
...
TypeError: mel() takes 0 positional arguments but 5 were given

Env

Google Colaboratory @ 2023-04-23

!pip show librosa
# Version: 0.10.0.post2

Cause

It is bacause of librosa version.
Previsouly, librosa.filters.mel accept positional arguments, but in latest version it should be named arguments.

How to Fix

Use named argument instead of positional arguments.
I checked this fix resolve the error in librosa==0.10.0.post2.

Proposal

I made the pull request (#133).
Could you please review it?

Pitch wobble effect with speech generator

Hi, your software is pretty amazing, I can get good results in just 1000 epochs. However, I have some issue on some output files. I'm changing Man voice -> Woman voice (trained model). However, the results often show an unsteady pitch, a slight wobble effect. I was wondering if you could add a slider to smooth out the output pitch (or the input). When I'm using Voice Generator Gui (svcg), there is no such a problem. Maybe because they use Pad and chunk ? or maybe because I use Crepe prediction method?
However, your fork seems very promising, the sound quality is really nice, I just hope there is a way to make the pitch more natural for spoken voice.

No such file or directory: 'ffmpeg'

Traceback (most recent call last):
File "/home/iot1/zhongzhilai/Retrieval-based-Voice-Conversion-WebUI/trainset_preprocess_pipeline_print.py", line 73, in pipeline
audio = load_audio(path, self.sr)
File "/home/iot1/zhongzhilai/Retrieval-based-Voice-Conversion-WebUI/my_utils.py", line 19, in load_audio
raise RuntimeError(f"Failed to load audio: {e}")
RuntimeError: Failed to load audio: [Errno 2] No such file or directory: 'ffmpeg'

/home/iot1/zhongzhilai/so-vits-svc/dataset_raw/speaker0/8_86.wav->Traceback (most recent call last):
File "/home/iot1/zhongzhilai/Retrieval-based-Voice-Conversion-WebUI/my_utils.py", line 14, in load_audio
ffmpeg.input(file, threads=0)
File "/home/iot1/anaconda3/envs/retrieval/lib/python3.9/site-packages/ffmpeg/_run.py", line 313, in run
process = run_async(
File "/home/iot1/anaconda3/envs/retrieval/lib/python3.9/site-packages/ffmpeg/_run.py", line 284, in run_async
return subprocess.Popen(
File "/home/iot1/anaconda3/envs/retrieval/lib/python3.9/subprocess.py", line 951, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "/home/iot1/anaconda3/envs/retrieval/lib/python3.9/subprocess.py", line 1821, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'ffmpeg'

Readme 里的ffmpeg相关的安装没看懂呢？

colab加个tensorboard吧

%load_ext tensorboard
%tensorboard --logdir /content/Retrieval-based-Voice-Conversion-WebUI/logs

在webUI那格子里，加在注释之前

Process 0 terminated with the following error:

I use RTX3060ti.

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "C:\RVC-beta\RVC-beta\runtime\lib\site-packages\torch\serialization.py", line 441, in save
_save(obj, opened_zipfile, pickle_module, pickle_protocol)
File "C:\RVC-beta\RVC-beta\runtime\lib\site-packages\torch\serialization.py", line 668, in _save
zip_file.write_record(name, storage.data_ptr(), num_bytes)
RuntimeError: [enforce fail at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\caffe2\serialize\inline_container.cc:476] . PytorchStreamWriter failed writing file data/2229: file write failed

RuntimeError: [enforce fail at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\caffe2\serialize\inline_container.cc:337] . unexpected pos 256653056 vs 256652948

I tried several times but all cases made like this error.(like 256653056 vs 256652948,this number is changed by in cace.)

can't find my 3060

the webui can't find my RTX 3060 12G

And I got this error when training.

if I add the path myself ,it still have error
"ValueError: need at least one array to concatenate"

I install webui from 7z

In Colab, the epochs progress strangely fast, and an unfinished voice synthesis is generated.

In Colab, when I run training with the current ipynb notebook, the epochs progress very quickly (about 1 epoch in 6 seconds with Tesla T4 and batch size 14), and even after training for about 300 epochs, a low-quality, grainy voice synthesis weight.pth file is generated. I have over 1000 training files totaling more than 50 minutes. Is everyone else experiencing the same issue?

Question: The details of Pretraining

Hello,

Thank you for providing such great code. I would like to know more about the pretraining process to better utilize this code. Specifically, I would like to know the following information:

・The dataset used for pretraining
・Techniques used during training

For the second point, I would like information such as "gradually increasing the number of speakers" that is necessary for reproducing the training.

Thank you.

pitch editor

do you think you can add a pitch editor to fix imperfections or upload your own pitch

RuntimeError: Given groups=1, weight of size [192, 513, 1], expected input[1, 1025, 368] to have 513 channels, but got 1025 channels instead

显卡是1050Ti，就按照教学视频跑了一遍，上传了总长约12分钟的音频，wav格式，32kHz和44100Hz采样率都试了试，报下面这个错。

Traceback (most recent call last):
File "E:\vits\RVC-beta\runtime\lib\site-packages\torch\multiprocessing\spawn.py", line 69, in _wrap
fn(i, *args)
File "E:\vits\RVC-beta\train_nsf_sim_cache_sid_load_pretrain.py", line 188, in run
train_and_evaluate(
File "E:\vits\RVC-beta\train_nsf_sim_cache_sid_load_pretrain.py", line 316, in train_and_evaluate
) = net_g(phone, phone_lengths, spec, spec_lengths, sid)
File "E:\vits\RVC-beta\runtime\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "E:\vits\RVC-beta\runtime\lib\site-packages\torch\nn\parallel\distributed.py", line 1156, in forward
output = self._run_ddp_forward(*inputs, **kwargs)
File "E:\vits\RVC-beta\runtime\lib\site-packages\torch\nn\parallel\distributed.py", line 1110, in _run_ddp_forward
return module_to_run(*inputs[0], **kwargs[0]) # type: ignore[index]
File "E:\vits\RVC-beta\runtime\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "E:\vits\RVC-beta\runtime\lib\site-packages\infer_pack\models.py", line 644, in forward
z, m_q, logs_q, y_mask = self.enc_q(y, y_lengths, g=g)
File "E:\vits\RVC-beta\runtime\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "E:\vits\RVC-beta\runtime\lib\site-packages\infer_pack\models.py", line 163, in forward
x = self.pre(x) * x_mask
File "E:\vits\RVC-beta\runtime\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "E:\vits\RVC-beta\runtime\lib\site-packages\torch\nn\modules\conv.py", line 313, in forward
return self._conv_forward(input, self.weight, self.bias)
File "E:\vits\RVC-beta\runtime\lib\site-packages\torch\nn\modules\conv.py", line 309, in _conv_forward
return F.conv1d(input, weight, bias, self.stride,
RuntimeError: Given groups=1, weight of size [192, 513, 1], expected input[1, 1025, 368] to have 513 channels, but got 1025 channels instead

MacOS M1

MacOS M1 可以训练吗

Localization in english

Anyone can translate this into english language? Thank you~~

伴奏人声分离时报错：FileNotFoundError: [Errno 2] No such file or directory: './uvr5_pack/data.json'

以下是报错信息:

Traceback (most recent call last):
File "D:\AI\soundtrain\RVC\Retrieval-based-Voice-Conversion-WebUI\infer-web.py", line 224, in uvr
pre_fun = audio_pre(
File "D:\AI\soundtrain\RVC\Retrieval-based-Voice-Conversion-WebUI\infer_uvr5.py", line 47, in init
param_name, model_params_d = _get_name_params(model_path, model_hash)
File "D:\AI\soundtrain\RVC\Retrieval-based-Voice-Conversion-WebUI\uvr5_pack\utils.py", line 102, in _get_name_params
data = load_data()
File "D:\AI\soundtrain\RVC\Retrieval-based-Voice-Conversion-WebUI\uvr5_pack\utils.py", line 8, in load_data
with open(file_name, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: './uvr5_pack/data.json'

Colab运行报错

环境: Google Colab
WebUI截图:

数据集是一些3秒左右的音频切片(mp3格式)

不知道是不是我的数据集导致的问题，如果是这样的话请教一下怎么调整我的数据集qwq

报错内容:

Traceback (most recent call last):
  File "/content/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 514, in <module>
    main()
  File "/content/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 42, in main
    mp.spawn(
  File "/usr/local/lib/python3.9/dist-packages/torch/multiprocessing/spawn.py", line 239, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/usr/local/lib/python3.9/dist-packages/torch/multiprocessing/spawn.py", line 197, in start_processes
    while not context.join():
  File "/usr/local/lib/python3.9/dist-packages/torch/multiprocessing/spawn.py", line 160, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
    fn(i, *args)
  File "/content/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 150, in run
    train_and_evaluate(
  File "/content/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 195, in train_and_evaluate
    for batch_idx, info in enumerate(train_loader):
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py", line 634, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py", line 1326, in _next_data
    return self._process_data(data)
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py", line 1372, in _process_data
    data.reraise()
  File "/usr/local/lib/python3.9/dist-packages/torch/_utils.py", line 644, in reraise
    raise exception
IndexError: Caught IndexError in DataLoader worker process 2.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
    data = fetcher.fetch(index)
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/content/Retrieval-based-Voice-Conversion-WebUI/train/data_utils.py", line 306, in __getitem__
    return self.get_audio_text_pair(self.audiopaths_and_text[index])
  File "/content/Retrieval-based-Voice-Conversion-WebUI/train/data_utils.py", line 248, in get_audio_text_pair
    phone = self.get_labels(phone)
  File "/content/Retrieval-based-Voice-Conversion-WebUI/train/data_utils.py", line 266, in get_labels
    phone = phone[:n_num, :]
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

日志:

/content/Retrieval-based-Voice-Conversion-WebUI
The tensorboard extension is already loaded. To reload it, use:
  %reload_ext tensorboard
Use Languane: en_US
2023-04-13 08:55:48.512021: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-04-13 08:55:49.817277: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2023-04-13 08:55:52 | INFO | fairseq.tasks.text_to_speech | Please install tensorboardX: pip install tensorboardX
Running on local URL:  http://127.0.0.1:7860
Running on public URL: https://b2945b7454172d80c1.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades (NEW!), check out Spaces: https://huggingface.co/spaces
start preprocess
['trainset_preprocess_pipeline_print.py', '/content/drive/MyDrive/audiouploads', '40000', '2', '/content/Retrieval-based-Voice-Conversion-WebUI/logs/test', 'False']
/content/drive/MyDrive/audiouploads/dataset-1.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-11.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-14.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-16.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-18.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-2.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-21.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-23.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-25.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-27.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-29.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-30.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-32.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-35.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-37.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-39.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-40.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-42.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-44.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-46.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-48.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-6.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-8.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-10.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-13.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-15.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-17.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-19.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-20.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-22.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-24.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-26.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-28.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-3.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-31.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-34.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-36.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-38.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-4.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-41.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-43.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-45.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-47.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-5.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-7.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-9.mp3->Suc.
end preprocess

/content/drive/MyDrive/audiouploads/dataset-43.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-45.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-47.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-5.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-7.mp3->Suc.
/content/drive/MyDrive/audiouploads/dataset-9.mp3->Suc.
end preprocess

2023-04-13 08:57:22.578351: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2023-04-13 08:57:24 | INFO | fairseq.tasks.text_to_speech | Please install tensorboardX: pip install tensorboardX
['extract_feature_print.py', 'cuda:0', '1', '0', '0', '/content/Retrieval-based-Voice-Conversion-WebUI/logs/test']
/content/Retrieval-based-Voice-Conversion-WebUI/logs/test
load model(s) from hubert_base.pt
2023-04-13 08:57:24 | INFO | fairseq.tasks.hubert_pretraining | current directory is /content/Retrieval-based-Voice-Conversion-WebUI
2023-04-13 08:57:24 | INFO | fairseq.tasks.hubert_pretraining | HubertPretrainingTask Config {'_name': 'hubert_pretraining', 'data': 'metadata', 'fine_tuning': False, 'labels': ['km'], 'label_dir': 'label', 'label_rate': 50.0, 'sample_rate': 16000, 'normalize': False, 'enable_padding': False, 'max_keep_size': None, 'max_sample_size': 250000, 'min_sample_size': 32000, 'single_target': False, 'random_crop': True, 'pad_audio': False}
2023-04-13 08:57:24 | INFO | fairseq.models.hubert.hubert | HubertModel Config: {'_name': 'hubert', 'label_rate': 50.0, 'extractor_mode': default, 'encoder_layers': 12, 'encoder_embed_dim': 768, 'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': gelu, 'layer_type': transformer, 'dropout': 0.1, 'attention_dropout': 0.1, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.05, 'dropout_input': 0.1, 'dropout_features': 0.1, 'final_dim': 256, 'untie_final_proj': True, 'layer_norm_first': False, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2', 'conv_bias': False, 'logit_temp': 0.1, 'target_glu': False, 'feature_grad_mult': 0.1, 'mask_length': 10, 'mask_prob': 0.8, 'mask_selection': static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': static, 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'conv_pos': 128, 'conv_pos_groups': 16, 'latent_temp': [2.0, 0.5, 0.999995], 'skip_masked': False, 'skip_nomask': False, 'checkpoint_activations': False, 'required_seq_len_multiple': 2, 'depthwise_conv_kernel_size': 31, 'attn_type': '', 'pos_enc_type': 'abs', 'fp16': False}
move model to cuda:0
all-feature-47
now-47,all-0,0_0.wav,(136, 256)
now-47,all-4,13_0.wav,(49, 256)
now-47,all-8,16_0.wav,(98, 256)
now-47,all-12,1_0.wav,(66, 256)
now-47,all-16,23_0.wav,(90, 256)
now-47,all-20,27_0.wav,(104, 256)
now-47,all-24,30_0.wav,(154, 256)
now-47,all-28,34_0.wav,(107, 256)
now-47,all-32,38_0.wav,(149, 256)
now-47,all-36,41_0.wav,(89, 256)
now-47,all-40,45_0.wav,(63, 256)
now-47,all-44,7_0.wav,(88, 256)
all-feature-done
['extract_feature_print.py', 'cuda:0', '1', '0', '0', '/content/Retrieval-based-Voice-Conversion-WebUI/logs/test']
/content/Retrieval-based-Voice-Conversion-WebUI/logs/test
load model(s) from hubert_base.pt
move model to cuda:0
all-feature-47
now-47,all-0,0_0.wav,(136, 256)
now-47,all-4,13_0.wav,(49, 256)
now-47,all-8,16_0.wav,(98, 256)
now-47,all-12,1_0.wav,(66, 256)
now-47,all-16,23_0.wav,(90, 256)
now-47,all-20,27_0.wav,(104, 256)
now-47,all-24,30_0.wav,(154, 256)
now-47,all-28,34_0.wav,(107, 256)
now-47,all-32,38_0.wav,(149, 256)
now-47,all-36,41_0.wav,(89, 256)
now-47,all-40,45_0.wav,(63, 256)
now-47,all-44,7_0.wav,(88, 256)
all-feature-done
DEBUG:tensorflow:Falling back to TensorFlow client; we recommended you install the Cloud TPU client directly with pip install cloud-tpu-client.
2023-04-13 08:57:34.028941: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
DEBUG:h5py._conv:Creating converter from 7 to 5
DEBUG:h5py._conv:Creating converter from 5 to 7
DEBUG:h5py._conv:Creating converter from 7 to 5
DEBUG:h5py._conv:Creating converter from 5 to 7
DEBUG:jaxlib.mlir._mlir_libs:Initializing MLIR with module: _site_initialize_0
DEBUG:jaxlib.mlir._mlir_libs:Registering dialects from initializer <module 'jaxlib.mlir._mlir_libs._site_initialize_0' from '/usr/local/lib/python3.9/dist-packages/jaxlib/mlir/_mlir_libs/_site_initialize_0.so'>
DEBUG:jax._src.path:etils.epath found. Using etils.epath for file I/O.
INFO:numexpr.utils:NumExpr defaulting to 2 threads.
DEBUG:tensorflow:Falling back to TensorFlow client; we recommended you install the Cloud TPU client directly with pip install cloud-tpu-client.
2023-04-13 08:57:38.841118: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
DEBUG:h5py._conv:Creating converter from 7 to 5
DEBUG:h5py._conv:Creating converter from 5 to 7
DEBUG:h5py._conv:Creating converter from 7 to 5
DEBUG:h5py._conv:Creating converter from 5 to 7
DEBUG:jaxlib.mlir._mlir_libs:Initializing MLIR with module: _site_initialize_0
DEBUG:jaxlib.mlir._mlir_libs:Registering dialects from initializer <module 'jaxlib.mlir._mlir_libs._site_initialize_0' from '/usr/local/lib/python3.9/dist-packages/jaxlib/mlir/_mlir_libs/_site_initialize_0.so'>
DEBUG:jax._src.path:etils.epath found. Using etils.epath for file I/O.
INFO:numexpr.utils:NumExpr defaulting to 2 threads.
INFO:test:{'train': {'log_interval': 200, 'seed': 1234, 'epochs': 20000, 'learning_rate': 0.0001, 'betas': [0.8, 0.99], 'eps': 1e-09, 'batch_size': 4, 'fp16_run': True, 'lr_decay': 0.999875, 'segment_size': 12800, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0}, 'data': {'max_wav_value': 32768.0, 'sampling_rate': 40000, 'filter_length': 2048, 'hop_length': 400, 'win_length': 2048, 'n_mel_channels': 125, 'mel_fmin': 0.0, 'mel_fmax': None, 'training_files': './logs/test/filelist.txt'}, 'model': {'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'kernel_size': 3, 'p_dropout': 0, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [10, 10, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 4, 4], 'use_spectral_norm': False, 'gin_channels': 256, 'spk_embed_dim': 109}, 'model_dir': './logs/test', 'experiment_dir': './logs/test', 'save_every_epoch': 5, 'name': 'test', 'total_epoch': 20, 'pretrainG': 'pretrained/G40k.pth', 'pretrainD': 'pretrained/D40k.pth', 'gpus': '0', 'sample_rate': '40k', 'if_f0': 0, 'if_latest': 0, 'if_cache_data_in_gpu': 0}
WARNING:test:/content/Retrieval-based-Voice-Conversion-WebUI/train is not a git repository, therefore hash value comparison will be ignored.
INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0
INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py:561: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  warnings.warn(_create_warning_msg(
gin_channels: 256 self.spk_embed_dim: 109
Traceback (most recent call last):
  File "/content/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 121, in run
    _, _, _, epoch_str = utils.load_checkpoint(utils.latest_checkpoint_path(hps.model_dir, "D_*.pth"), net_d, optim_d)  # D多半加载没事
  File "/content/Retrieval-based-Voice-Conversion-WebUI/train/utils.py", line 163, in latest_checkpoint_path
    x = f_list[-1]
IndexError: list index out of range
INFO:test:loaded pretrained pretrained/G40k.pth pretrained/D40k.pth
<All keys matched successfully>
<All keys matched successfully>
DEBUG:tensorflow:Falling back to TensorFlow client; we recommended you install the Cloud TPU client directly with pip install cloud-tpu-client.
DEBUG:tensorflow:Falling back to TensorFlow client; we recommended you install the Cloud TPU client directly with pip install cloud-tpu-client.
DEBUG:tensorflow:Falling back to TensorFlow client; we recommended you install the Cloud TPU client directly with pip install cloud-tpu-client.
2023-04-13 08:57:54.293283: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2023-04-13 08:57:54.299536: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
DEBUG:tensorflow:Falling back to TensorFlow client; we recommended you install the Cloud TPU client directly with pip install cloud-tpu-client.
2023-04-13 08:57:54.309187: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
DEBUG:h5py._conv:Creating converter from 7 to 5
DEBUG:h5py._conv:Creating converter from 5 to 7
DEBUG:h5py._conv:Creating converter from 7 to 5
DEBUG:h5py._conv:Creating converter from 5 to 7
DEBUG:h5py._conv:Creating converter from 7 to 5
DEBUG:h5py._conv:Creating converter from 5 to 7
DEBUG:h5py._conv:Creating converter from 7 to 5
DEBUG:h5py._conv:Creating converter from 5 to 7
DEBUG:h5py._conv:Creating converter from 7 to 5
DEBUG:h5py._conv:Creating converter from 5 to 7
DEBUG:h5py._conv:Creating converter from 7 to 5
DEBUG:h5py._conv:Creating converter from 5 to 7
2023-04-13 08:57:54.576913: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
DEBUG:h5py._conv:Creating converter from 7 to 5
DEBUG:h5py._conv:Creating converter from 5 to 7
DEBUG:h5py._conv:Creating converter from 7 to 5
DEBUG:h5py._conv:Creating converter from 5 to 7
DEBUG:jaxlib.mlir._mlir_libs:Initializing MLIR with module: _site_initialize_0
DEBUG:jaxlib.mlir._mlir_libs:Registering dialects from initializer <module 'jaxlib.mlir._mlir_libs._site_initialize_0' from '/usr/local/lib/python3.9/dist-packages/jaxlib/mlir/_mlir_libs/_site_initialize_0.so'>
DEBUG:jaxlib.mlir._mlir_libs:Initializing MLIR with module: _site_initialize_0
DEBUG:jaxlib.mlir._mlir_libs:Registering dialects from initializer <module 'jaxlib.mlir._mlir_libs._site_initialize_0' from '/usr/local/lib/python3.9/dist-packages/jaxlib/mlir/_mlir_libs/_site_initialize_0.so'>
DEBUG:jaxlib.mlir._mlir_libs:Initializing MLIR with module: _site_initialize_0
DEBUG:jaxlib.mlir._mlir_libs:Registering dialects from initializer <module 'jaxlib.mlir._mlir_libs._site_initialize_0' from '/usr/local/lib/python3.9/dist-packages/jaxlib/mlir/_mlir_libs/_site_initialize_0.so'>
DEBUG:jax._src.path:etils.epath found. Using etils.epath for file I/O.
DEBUG:jax._src.path:etils.epath found. Using etils.epath for file I/O.
DEBUG:jaxlib.mlir._mlir_libs:Initializing MLIR with module: _site_initialize_0
DEBUG:jaxlib.mlir._mlir_libs:Registering dialects from initializer <module 'jaxlib.mlir._mlir_libs._site_initialize_0' from '/usr/local/lib/python3.9/dist-packages/jaxlib/mlir/_mlir_libs/_site_initialize_0.so'>
DEBUG:jax._src.path:etils.epath found. Using etils.epath for file I/O.
DEBUG:jax._src.path:etils.epath found. Using etils.epath for file I/O.
INFO:numexpr.utils:NumExpr defaulting to 2 threads.
INFO:numexpr.utils:NumExpr defaulting to 2 threads.
INFO:numexpr.utils:NumExpr defaulting to 2 threads.
INFO:numexpr.utils:NumExpr defaulting to 2 threads.
/usr/local/lib/python3.9/dist-packages/torch/functional.py:641: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:862.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
/usr/local/lib/python3.9/dist-packages/torch/functional.py:641: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:862.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
/usr/local/lib/python3.9/dist-packages/torch/functional.py:641: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:862.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
/usr/local/lib/python3.9/dist-packages/torch/functional.py:641: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:862.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
/content/Retrieval-based-Voice-Conversion-WebUI/train/mel_processing.py:93: FutureWarning: Pass sr=40000, n_fft=2048, n_mels=125, fmin=0.0, fmax=None as keyword args. From version 0.10 passing these as positional arguments will result in an error
  mel = librosa_mel_fn(sampling_rate, n_fft, num_mels, fmin, fmax)
/usr/local/lib/python3.9/dist-packages/torch/functional.py:641: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:862.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration.
/usr/local/lib/python3.9/dist-packages/torch/autograd/__init__.py:200: UserWarning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed.  This is not an error, but may impair performance.
grad.sizes() = [1, 21, 96], strides() = [43296, 96, 1]
bucket_view.sizes() = [1, 21, 96], strides() = [2016, 96, 1] (Triggered internally at ../torch/csrc/distributed/c10d/reducer.cpp:323.)
  Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
INFO:test:Train Epoch: 1 [0%]
INFO:test:[0, 0.0001]
INFO:test:loss_disc=3.231, loss_gen=2.107, loss_fm=9.420,loss_mel=30.646, loss_kl=5.000
DEBUG:matplotlib:matplotlib data path: /usr/local/lib/python3.9/dist-packages/matplotlib/mpl-data
DEBUG:matplotlib:CONFIGDIR=/root/.config/matplotlib
DEBUG:matplotlib:interactive is False
DEBUG:matplotlib:platform is linux
INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration.
Traceback (most recent call last):
  File "/content/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 514, in <module>
    main()
  File "/content/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 42, in main
    mp.spawn(
  File "/usr/local/lib/python3.9/dist-packages/torch/multiprocessing/spawn.py", line 239, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/usr/local/lib/python3.9/dist-packages/torch/multiprocessing/spawn.py", line 197, in start_processes
    while not context.join():
  File "/usr/local/lib/python3.9/dist-packages/torch/multiprocessing/spawn.py", line 160, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
    fn(i, *args)
  File "/content/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 150, in run
    train_and_evaluate(
  File "/content/Retrieval-based-Voice-Conversion-WebUI/train_nsf_sim_cache_sid_load_pretrain.py", line 195, in train_and_evaluate
    for batch_idx, info in enumerate(train_loader):
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py", line 634, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py", line 1326, in _next_data
    return self._process_data(data)
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py", line 1372, in _process_data
    data.reraise()
  File "/usr/local/lib/python3.9/dist-packages/torch/_utils.py", line 644, in reraise
    raise exception
IndexError: Caught IndexError in DataLoader worker process 2.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
    data = fetcher.fetch(index)
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/content/Retrieval-based-Voice-Conversion-WebUI/train/data_utils.py", line 306, in __getitem__
    return self.get_audio_text_pair(self.audiopaths_and_text[index])
  File "/content/Retrieval-based-Voice-Conversion-WebUI/train/data_utils.py", line 248, in get_audio_text_pair
    phone = self.get_labels(phone)
  File "/content/Retrieval-based-Voice-Conversion-WebUI/train/data_utils.py", line 266, in get_labels
    phone = phone[:n_num, :]
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

pip install -r requirements.txt报错

请问是故意的还是不小心的←_←为什么会有googleads


Collecting googleads==3.8.0
  Using cached https://mirrors.aliyun.com/pypi/packages/fa/f8/f84ad483afaa29bfc807ab6e8a06b6712ee494a2aad7db545865655bdf99/googleads-3.8.0.tar.gz (23 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [1 lines of output]
      error in googleads setup command: use_2to3 is invalid.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Document request: Training from scratch

Summary

There seems to be no document of training from scratch (base model training).
Can you share the method?
Once I successfully reproduce the result with shared info, I am grad to write document (make PR).

Current Status

Current RVC repository contain enough information for fine-tuning.
But, there is only little info about training from scratch (base model training).

README suggests that we can train base model with VCTK.
/train_nsf_sim_cache_sid_load_pretrain.py is training code, but it seems to be for fine-tuning.
/train/ contain some command txt, but it specify missing file (train_nsf_sim_cache_sid.py).

As a result, I cannnot reproduce base model.

Request

Can you share method to train base model?

Proposal

If you kindly share the method, I am grad to write documentation for future developers.

有通过命令行实现的模型训练和推理的操作步骤吗？

请问 trainset_preprocess_pipeline_print.py 文件是什么功能？

up主好！信号处理小白想请教一下，这个文件完成了什么功能呢？
似乎是把音频分割成好多小段、去除了长静音。

另外请教下您的模型框架是怎么样的呢，是否有相关资料可以学习下？
感谢回答！

Preparation of tutorials in English

I would like to create a tutorial in /docs in markdown format. In the tutorial, I would like to write:

How to tune faiss for developers
Explanation of learning and inference parameters for beginners

First, I will write the tuning method of the former faiss and create a PR by ~4/19.

灵活选择gpu

在infer阶段，如果有两个以上的gpu，程序会默认选择所有gpu进行推理。
这时如果有gpu已经被占用，就会发生内存不足的错误。
希望在infer阶段加入选择gpu的选项。

probable bug in gui.py

I'm not at all familiar with python so its possible this isnt a bug but rather is an issue on my end.
I'm running this in a conda environment on linux mint 20. When I launch gui.py it gives an index error on line 220. I looked at the code in gui.py and found that when it runs default_value=input_devices[sd.default.device[0]] sd.default.device[0] is outside the range of input_devices. I replaced that line and the similar default output device line with default_value=input_devices[0] and default_value=output_devices[0] respectively and then it launched successfully.

Realtime Voice Conversion for RVC

A really really great job!!! I'm impressed with how short learning time was and how accurate your results are. I've developed a real-time voice conversion software using RVC, so if you don't mind I'd be really grateful if you could put a link on the Readme.

步骤0加载不到音频提示File Not Found

Use Language: zh_CN
Running on local URL:  http://0.0.0.0:7865
start preprocess
['trainset_preprocess_pipeline_print.py', 'E:\\BaiduYunDownload\\ayakaVoice', '40000', '12', 'E:\\PycharmProjects\\Retrieval-based-Voice-Conversion-WebUI/logs/test', 'False']
E:\BaiduYunDownload\ayakaVoice/1.wav->Traceback (most recent call last):
  File "E:\PycharmProjects\Retrieval-based-Voice-Conversion-WebUI\my_utils.py", line 14, in load_audio
    ffmpeg.input(file, threads=0)
  File "E:\PycharmProjects\Retrieval-based-Voice-Conversion-WebUI\.venv\lib\site-packages\ffmpeg\_run.py", line 313, in run
    process = run_async(
  File "E:\PycharmProjects\Retrieval-based-Voice-Conversion-WebUI\.venv\lib\site-packages\ffmpeg\_run.py", line 284, in run_async
    return subprocess.Popen(
  File "e:\anaconda3\lib\subprocess.py", line 951, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "e:\anaconda3\lib\subprocess.py", line 1420, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] 系统找不到指定的文件。

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "E:\PycharmProjects\Retrieval-based-Voice-Conversion-WebUI\trainset_preprocess_pipeline_print.py", line 73, in pipeline
    audio = load_audio(path, self.sr)
  File "E:\PycharmProjects\Retrieval-based-Voice-Conversion-WebUI\my_utils.py", line 19, in load_audio
    raise RuntimeError(f"Failed to load audio: {e}")
RuntimeError: Failed to load audio: [WinError 2] 系统找不到指定的文件。

E:\BaiduYunDownload\ayakaVoice/20.wav->Traceback (most recent call last):
  File "E:\PycharmProjects\Retrieval-based-Voice-Conversion-WebUI\my_utils.py", line 14, in load_audio
    ffmpeg.input(file, threads=0)
  File "E:\PycharmProjects\Retrieval-based-Voice-Conversion-WebUI\.venv\lib\site-packages\ffmpeg\_run.py", line 313, in run
    process = run_async(
  File "E:\PycharmProjects\Retrieval-based-Voice-Conversion-WebUI\.venv\lib\site-packages\ffmpeg\_run.py", line 284, in run_async
    return subprocess.Popen(
  File "e:\anaconda3\lib\subprocess.py", line 951, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "e:\anaconda3\lib\subprocess.py", line 1420, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] 系统找不到指定的文件。

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "E:\PycharmProjects\Retrieval-based-Voice-Conversion-WebUI\trainset_preprocess_pipeline_print.py", line 73, in pipeline
    audio = load_audio(path, self.sr)
  File "E:\PycharmProjects\Retrieval-based-Voice-Conversion-WebUI\my_utils.py", line 19, in load_audio
    raise RuntimeError(f"Failed to load audio: {e}")
RuntimeError: Failed to load audio: [WinError 2] 系统找不到指定的文件。

.........以下省略

一开始以为是我带了中文路径，但是把名字改了之后似乎还是不行..

Quited after clicking "Start Audio Conversion"

As title

Screenshot:

Filename with three dots '...' is causing checkout failure.

What is torchgen used for?

Hi,

torchgen is in the project's dependencies, but I couldn't find any information on how to use it.
How is this useful?

No such file or directory: RVC-beta/preprocess.log'

I downloaded and unzipped RVC-Beta.7z, then installed the latest version from there under releases and put it inside.
However, when I was trying to run it.
I was trying to do step2a of the training and it came up.

runtime\python.exe trainset_preprocess_pipeline_print.py C:\Users\xxxxxx\Documents\VC 48000 16 D:\RVC-beta (1)\RVC-beta/logs/xxxxxxFalse
Traceback (most recent call last):.
File "D:\RVC-beta (1)\RVC-beta\trainset_preprocess_pipeline_print.py", line 20, in <module
f = open("%s/preprocess.log" % exp_dir, "a+")
FileNotFoundError: [Errno 2] No such file or directory: 'D:\RVC-beta/preprocess.log'
The following message is displayed.

What should I do?
Past versions were working.
(2023/04/10 version)

issue running, app.py file missing

hello it seems that the app.py file was not included in the download and thus i am unable to run the program . Do you think that the file goes under a different name?

怎么解决的

          训练也跑不起来：

start preprocess
['trainset_preprocess_pipeline_print.py', 'I:\VoiceConversionWebUI\traning\input', '40000', '16', 'I:\VoiceConversionWebUI/logs/gemikovoice', 'False']
Fail. Traceback (most recent call last):
File "I:\VoiceConversionWebUI\trainset_preprocess_pipeline_print.py", line 90, in pipeline_mp_inp_dir
p.start()
File "C:\Users\Naught\AppData\Local\Programs\Python\Python310\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "C:\Users\Naught\AppData\Local\Programs\Python\Python310\lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\Naught\AppData\Local\Programs\Python\Python310\lib\multiprocessing\context.py", line 336, in _Popen
return Popen(process_obj)
File "C:\Users\Naught\AppData\Local\Programs\Python\Python310\lib\multiprocessing\popen_spawn_win32.py", line 93, in init
reduction.dump(process_obj, to_child)
File "C:\Users\Naught\AppData\Local\Programs\Python\Python310\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle '_io.TextIOWrapper' object

end preprocess
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\Naught\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 107, in spawn_main
new_handle = reduction.duplicate(pipe_handle,
File "C:\Users\Naught\AppData\Local\Programs\Python\Python310\lib\multiprocessing\reduction.py", line 79, in duplicate
return _winapi.DuplicateHandle(
OSError: [WinError 6] 句柄无效。
start preprocess
['trainset_preprocess_pipeline_print.py', 'I:\VoiceConversionWebUI\traning\input', '40000', '16', 'I:\VoiceConversionWebUI/logs/gemikovoice', 'False']
Fail. Traceback (most recent call last):
File "I:\VoiceConversionWebUI\trainset_preprocess_pipeline_print.py", line 90, in pipeline_mp_inp_dir
p.start()
File "C:\Users\Naught\AppData\Local\Programs\Python\Python310\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "C:\Users\Naught\AppData\Local\Programs\Python\Python310\lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\Naught\AppData\Local\Programs\Python\Python310\lib\multiprocessing\context.py", line 336, in _Popen
return Popen(process_obj)
File "C:\Users\Naught\AppData\Local\Programs\Python\Python310\lib\multiprocessing\popen_spawn_win32.py", line 93, in init
reduction.dump(process_obj, to_child)
File "C:\Users\Naught\AppData\Local\Programs\Python\Python310\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle '_io.TextIOWrapper' object

end preprocess

['extract_feature_print.py', '1', '0', 'I:\VoiceConversionWebUI/logs/gemikovoice']
I:\VoiceConversionWebUI/logs/gemikovoice
load model(s) from hubert_base.pt
2023-04-06 16:52:05 | INFO | fairseq.tasks.hubert_pretraining | current directory is I:\VoiceConversionWebUI
2023-04-06 16:52:05 | INFO | fairseq.tasks.hubert_pretraining | HubertPretrainingTask Config {'_name': 'hubert_pretraining', 'data': 'metadata', 'fine_tuning': False, 'labels': ['km'], 'label_dir': 'label', 'label_rate': 50.0, 'sample_rate': 16000, 'normalize': False, 'enable_padding': False, 'max_keep_size': None, 'max_sample_size': 250000, 'min_sample_size': 32000, 'single_target': False, 'random_crop': True, 'pad_audio': False}
2023-04-06 16:52:05 | INFO | fairseq.models.hubert.hubert | HubertModel Config: {'_name': 'hubert', 'label_rate': 50.0, 'extractor_mode': default, 'encoder_layers': 12, 'encoder_embed_dim': 768, 'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': gelu, 'layer_type': transformer, 'dropout': 0.1, 'attention_dropout': 0.1, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.05, 'dropout_input': 0.1, 'dropout_features': 0.1, 'final_dim': 256, 'untie_final_proj': True, 'layer_norm_first': False, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2', 'conv_bias': False, 'logit_temp': 0.1, 'target_glu': False, 'feature_grad_mult': 0.1, 'mask_length': 10, 'mask_prob': 0.8, 'mask_selection': static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': static, 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'conv_pos': 128, 'conv_pos_groups': 16, 'latent_temp': [2.0, 0.5, 0.999995], 'skip_masked': False, 'skip_nomask': False, 'checkpoint_activations': False, 'required_seq_len_multiple': 2, 'depthwise_conv_kernel_size': 31, 'attn_type': '', 'pos_enc_type': 'abs', 'fp16': False}
no-feature-todo
['extract_feature_print.py', '1', '0', 'I:\VoiceConversionWebUI/logs/gemikovoice']
I:\VoiceConversionWebUI/logs/gemikovoice
load model(s) from hubert_base.pt
no-feature-todo

INFO:gemikovoice:{'train': {'log_interval': 200, 'seed': 1234, 'epochs': 20000, 'learning_rate': 0.0001, 'betas': [0.8, 0.99], 'eps': 1e-09, 'batch_size': 4, 'fp16_run': True, 'lr_decay': 0.999875, 'segment_size': 12800, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0}, 'data': {'max_wav_value': 32768.0, 'sampling_rate': 40000, 'filter_length': 2048, 'hop_length': 400, 'win_length': 2048, 'n_mel_channels': 125, 'mel_fmin': 0.0, 'mel_fmax': None, 'training_files': './logs\gemikovoice/filelist.txt'}, 'model': {'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'kernel_size': 3, 'p_dropout': 0, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [10, 10, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 4, 4], 'use_spectral_norm': False, 'gin_channels': 256, 'spk_embed_dim': 109}, 'model_dir': './logs\gemikovoice', 'experiment_dir': './logs\gemikovoice', 'save_every_epoch': 5, 'name': 'gemikovoice', 'total_epoch': 10, 'pretrainG': 'pretrained/G40k.pth', 'pretrainD': 'pretrained/D40k.pth', 'gpus': '0', 'sample_rate': '40k', 'if_f0': 0, 'if_latest': 0, 'if_cache_data_in_gpu': 0}
WARNING:gemikovoice:I:\VoiceConversionWebUI\train is not a git repository, therefore hash value comparison will be ignored.
INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0
INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
gin_channels: 256 self.spk_embed_dim: 109
Traceback (most recent call last):
File "I:\VoiceConversionWebUI\train_nsf_sim_cache_sid_load_pretrain.py", line 121, in run
_, _, , epoch_str = utils.load_checkpoint(utils.latest_checkpoint_path(hps.model_dir, "D*.pth"), net_d, optim_d) # D多半加载没事
File "I:\VoiceConversionWebUI\train\utils.py", line 163, in latest_checkpoint_path
x = f_list[-1]
IndexError: list index out of range
INFO:gemikovoice:loaded pretrained pretrained/G40k.pth pretrained/D40k.pth

I:\VoiceConversionWebUI\venv\lib\site-packages\torch\cuda\amp\grad_scaler.py:120: UserWarning: torch.cuda.amp.GradScaler is enabled, but CUDA is not available. Disabling.
warnings.warn("torch.cuda.amp.GradScaler is enabled, but CUDA is not available. Disabling.")
INFO:gemikovoice:====> Epoch: 1
I:\VoiceConversionWebUI\venv\lib\site-packages\torch\optim\lr_scheduler.py:139: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
warnings.warn("Detected call of lr_scheduler.step() before optimizer.step(). "
INFO:gemikovoice:====> Epoch: 2
INFO:gemikovoice:====> Epoch: 3
INFO:gemikovoice:====> Epoch: 4
INFO:gemikovoice:Saving model and optimizer state at iteration 5 to ./logs\gemikovoice\G_0.pth
INFO:gemikovoice:Saving model and optimizer state at iteration 5 to ./logs\gemikovoice\D_0.pth
INFO:gemikovoice:====> Epoch: 5
INFO:gemikovoice:====> Epoch: 6
INFO:gemikovoice:====> Epoch: 7
INFO:gemikovoice:====> Epoch: 8
INFO:gemikovoice:====> Epoch: 9
INFO:gemikovoice:Saving model and optimizer state at iteration 10 to ./logs\gemikovoice\G_0.pth
INFO:gemikovoice:Saving model and optimizer state at iteration 10 to ./logs\gemikovoice\D_0.pth
INFO:gemikovoice:====> Epoch: 10
INFO:gemikovoice:Training is done. The program is closed.
saving final ckpt: Success.
Traceback (most recent call last):
File "I:\VoiceConversionWebUI\train_nsf_sim_cache_sid_load_pretrain.py", line 515, in
main()
File "I:\VoiceConversionWebUI\train_nsf_sim_cache_sid_load_pretrain.py", line 42, in main
mp.spawn(
File "I:\VoiceConversionWebUI\venv\lib\site-packages\torch\multiprocessing\spawn.py", line 239, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "I:\VoiceConversionWebUI\venv\lib\site-packages\torch\multiprocessing\spawn.py", line 197, in start_processes
while not context.join():
File "I:\VoiceConversionWebUI\venv\lib\site-packages\torch\multiprocessing\spawn.py", line 149, in join
raise ProcessExitedException(
torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with exit code 2333333
Traceback (most recent call last):
File "I:\VoiceConversionWebUI\venv\lib\site-packages\gradio\routes.py", line 393, in run_predict
output = await app.get_blocks().process_api(
File "I:\VoiceConversionWebUI\venv\lib\site-packages\gradio\blocks.py", line 1108, in process_api
result = await self.call_function(
File "I:\VoiceConversionWebUI\venv\lib\site-packages\gradio\blocks.py", line 929, in call_function
prediction = await anyio.to_thread.run_sync(
File "I:\VoiceConversionWebUI\venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "I:\VoiceConversionWebUI\venv\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "I:\VoiceConversionWebUI\venv\lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
File "I:\VoiceConversionWebUI\venv\lib\site-packages\gradio\utils.py", line 490, in async_iteration
return next(iterator)
File "I:\VoiceConversionWebUI\infer-web.py", line 421, in train1key
big_npy = np.concatenate(npys, 0)
File "<array_function internals>", line 180, in concatenate
ValueError: need at least one array to concatenate

Originally posted by @NaughtDZ in #18 (comment)

3070也不能用来训练吗？

就一张3070的显卡，运行后控制台依旧输出：没有发现支持的N卡, 使用CPU进行推理

gui版本支持无音高模型的推理

using the latest version，用的是最新版本

这是我的电脑配置：

本地运行 `infer-web.py` 报错

Stacktrace:

Use Language: en_US
Running on local URL:  http://0.0.0.0:7865
tcgetpgrp failed: Not a tty
2023-04-19 23:51:45 | INFO | fairseq.tasks.hubert_pretraining | current directory is /......./Retrieval-based-Voice-Conversion-WebUI
2023-04-19 23:51:45 | INFO | fairseq.tasks.hubert_pretraining | HubertPretrainingTask Config {'_name': 'hubert_pretraining', 'data': 'metadata', 'fine_tuning': False, 'labels': ['km'], 'label_dir': 'label', 'label_rate': 50.0, 'sample_rate': 16000, 'normalize': False, 'enable_padding': False, 'max_keep_size': None, 'max_sample_size': 250000, 'min_sample_size': 32000, 'single_target': False, 'random_crop': True, 'pad_audio': False}
2023-04-19 23:51:45 | INFO | fairseq.models.hubert.hubert | HubertModel Config: {'_name': 'hubert', 'label_rate': 50.0, 'extractor_mode': default, 'encoder_layers': 12, 'encoder_embed_dim': 768, 'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': gelu, 'layer_type': transformer, 'dropout': 0.1, 'attention_dropout': 0.1, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.05, 'dropout_input': 0.1, 'dropout_features': 0.1, 'final_dim': 256, 'untie_final_proj': True, 'layer_norm_first': False, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2', 'conv_bias': False, 'logit_temp': 0.1, 'target_glu': False, 'feature_grad_mult': 0.1, 'mask_length': 10, 'mask_prob': 0.8, 'mask_selection': static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': static, 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'conv_pos': 128, 'conv_pos_groups': 16, 'latent_temp': [2.0, 0.5, 0.999995], 'skip_masked': False, 'skip_nomask': False, 'checkpoint_activations': False, 'required_seq_len_multiple': 2, 'depthwise_conv_kernel_size': 31, 'attn_type': '', 'pos_enc_type': 'abs', 'fp16': False}
Traceback (most recent call last):
  File "/......./infer-web.py", line 141, in vc_single
    if_f0 = cpt.get("f0", 1)
NameError: name 'cpt' is not defined

我pull的是master branch的code，本地可以训练，训练后想试试voice conversion的时候报错。看了下 cpt 貌似确实在那个function里没有被定义

Accelerating Faiss retrieval using FastScan in Faiss

Thank you for the amazing software. I am particularly interested in the interesting applications of vector search. I am still in the process of setting up, but I plan to try running it soon.

While reading the source code, I noticed a point of concern in the faiss part and created an issue.

Currently, IVF512 is used in retrieval.
While I think this is simple and effective as a baseline on the GPU, I believe there are better index factory options when running on the CPU.
https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI/blob/6c7c1d933ffe2217edc74afadff7eec0078d6d16/infer/train-index.py#L19

This can be done using the FastScan method, by simply changing the index factory from "IVF512,Flat" to "IVF512PQ128x4fsr,Rflat" (512 is the original IVF's parameter, PQ128 indicates half of 256 dimention).

Since I haven't been able to run RVC yet, I'm not sure if this parameter is effective, but in most cases, it works effectively on both the CPU and GPU.
Once I run it and find it effective, I will report back in this issue.

推理后的歌曲长度和原长度有一点不一致，附可能的解决方案

使用的是0416updated版本，推理的歌曲长度，和原歌曲长度上有些许不一致，比如YOASOBI的“偶像”，原曲3:33:228，推理后是3:33:200,如果歌曲较长，节奏又较快的话，这种积累效应可能会听出来，影响成品品质。

我之前使用的一种切分方案，能精确到和原曲一致，供参考，主要思路是使用librosa.util.frame拆分，并为最后一部分做padding，最后一部分做特殊处理：
SAMPLE_RATE=48000
def main(args):
audio, sr = librosa.load(args.wave, sr=SR)
audio_length=len(audio)
pad_length = frame_length - (audio_length - frame_length) % hop_length # calculate the padding length
audio=np.pad(audio, (0, pad_length), mode='constant') # pad the array with zeros
# split the audio into frames of 30 seconds with zero overlap
frames = librosa.util.frame(audio, frame_length=frame_length, hop_length=hop_length)
frames = np.transpose(frames, (1, 0))
# initialize an empty list to store the processed frames
bwe_frames = []
for idx,frame in enumerate(frames):
# append the processed frame to the list
if idx==len(frames)-1:
bwe_frames.append(bwe_frame[:-pad_length*int(SAMPLE_RATE/SR)])
else:
bwe_frames.append(bwe_frame)
# concatenate the processed frames into a single array
bwe_audio = np.concatenate(bwe_frames)
write("svc_out_48k.wav", SAMPLE_RATE, bwe_audio)

rvc-project / retrieval-based-voice-conversion-webui Goto Github PK

retrieval-based-voice-conversion-webui's Introduction

Retrieval-based-Voice-Conversion-WebUI

简介

环境配置

Python 版本限制

Linux/MacOS 一键依赖安装启动脚本

手动安装依赖

其他资源准备

1. assets

自动检查/下载资源(默认)

手动下载资源

2. 安装 ffmpeg 工具

Ubuntu/Debian 用户

MacOS 用户

Windows 用户

3. 下载 rmvpe 人声音高提取算法所需文件

下载 rmvpe 的 dml 环境(可选, A卡/I卡用户)

4. AMD显卡Rocm(可选, 仅Linux)

开始使用

直接启动

Linux/MacOS 用户

对于需要使用IPEX技术的I卡用户(仅Linux)

使用整合包 (Windows 用户)

参考项目

感谢所有贡献者作出的努力

retrieval-based-voice-conversion-webui's People

Contributors

Stargazers

Watchers

Forkers

retrieval-based-voice-conversion-webui's Issues

Summary

Status

Env

Cause

How to Fix

Proposal

Summary

Current Status

Request

Proposal

Recommend Projects

Recommend Topics

Recommend Org

Jobs