jarodmica / ai-voice-cloning Goto Github PK

License: GNU General Public License v3.0

Dockerfile 0.84% Jupyter Notebook 7.02% Batchfile 4.21% Shell 4.13% Python 83.79%

ai-voice-cloning's Introduction

AI Voice Cloning

Note I do not plan on actively working on improvements/enhancements for this project, this is mainly meant to keep the repo in a working state in the case the original git.ecker goes down or necessary package changes need to be made.

That being said, some enhancements added compared to the original repo:

✔️ Possible to train in other languages

✔️ Hifigan added, allowing for faster inference at the cost of quality.

✔️ whisper-v3 added as a chooseable option for whisperx

✔️ Output conversion using RVC

This is a fork of the repo originally located here: https://git.ecker.tech/mrq/ai-voice-cloning. All of the work that was put into it to incoporate training with DLAS and inference with Tortoise belong to mrq, the author of the original ai-voice-cloning repo.

Setup

This repo works on Windows with NVIDIA GPUs and Linux running Docker with NVIDIA GPUs.

Windows Package (Recommended)

Optional, but recommended: Install 7zip on your computer: https://www.7-zip.org/
- If you run into any extraction issues, most likely it's due to your 7zip being out-of-date OR you're using a different extractor.
Head over to the releases tab and download the latest package on Hugging Face: https://github.com/JarodMica/ai-voice-cloning/releases/tag/v3.0
Extract the 7zip archive.
Open up ai-voice-cloning and then run start.bat

Alternative Manual Installation

If you are installing this manually, you will need:

Python 3.11: https://www.python.org/downloads/release/python-311/
Git: https://www.git-scm.com/downloads

Clone the repository

git clone https://github.com/JarodMica/ai-voice-cloning.git

Run the setup-cuda.bat file and it will start running through all of the python packages needed
- If you don't have python 3.11, it won't work and you'll need to go download it
After it finishes, run start.bat and this will start downloading most of the models you'll need.
- Some models are downloaded when you first use them. You'll incur additional downloads during generation and when training (for whisper). However, once they are finished, you won't ever have to download them again as long as you don't delete them. They are located in the models folder of the root.
(Optional) You can opt to install whisperx for training by running setup-whipserx.bat
- Check out the whisperx github page for more details, but it's much faster for longer audio files. If you're processing one-by-one with an already split dataset, it doesn't improve speeds that much.

Docker for Linux (or WSL2)

Linux Specific Setup

Make sure the latest nvidia drivers are installed: sudo ubuntu-drivers install
Install Docker your preferred way

Windows Specific Setup

Make sure your Nvidia drivers are up to date: https://www.nvidia.com/download/index.aspx

Install WSL2 in PowerShell with wsl --install and restart
Open PowerShell, type and enter ubuntu. It should now load you into wsl2
Remove the original nvidia cache key: sudo apt-key del 7fa2af80
Download CUDA toolkit keyring: wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-keyring_1.1-1_all.deb
Install keyring: sudo dpkg -i cuda-keyring_1.1-1_all.deb
Update package list: sudo apt-get update
Install CUDA toolkit: sudo apt-get -y install cuda-toolkit-12-4
Install Docker Desktop using WSL2 as the backend
Restart
If you wish to monitor the terminal remotely via SSH, follow this guide.
Open PowerShell, type ubuntu, then follow below

Building and Running in Docker

Open a terminal (or Ubuntu WSL)
Clone the repository: git clone https://github.com/JarodMica/ai-voice-cloning.git && cd ai-voice-cloning
Build the image with ./setup-docker.sh
Start the container with ./start-docker.sh
Visit http://localhost:7860 or remotely with http://<ip>:7860

Instructions

Checkout the YouTube video:

Watch First: https://youtu.be/WWhNqJEmF9M?si=RhUZhYersAvSZ4wf

Watch Second (RVC update): https://www.youtube.com/watch?v=7tpWH8_S8es&t=504s

Everything is pretty much the same as before if you've used this repository in the past, however, there is a new option to convert text output using rvc. Before you can use it, you will need a trained RVC .pth file that you get from RVC or online, and then you will need to place it in models/rvc_models/. Both .index and .pth files can be placed in here and they'll show up correctly in their respective dropdown menus.

To enable rvc:

Check and enable Show Experimental Settings to reveal more options
Check and enable Run the outputter audio through RVC. You will now have access to parameters you could adjust in RVC for the RVC voice model you're using.

Updating Your Installation

Below are how you can update the package for the latest updates

Windows

NOTE: If there are major feature change, check the latest release to see if update_package.bat will work. If NOT, you will need to re-download and re-extract the package from Hugging Face.

Run the update_package.bat file
- It will clone the repo and will copy the src folder from the repo to the package.

Alternative Manual Installation

You should be able to navigate into the folder and then pull the repo to update it.

cd ai-voice-cloning
git pull

If there are large features added, you may need to delete the venv and the re-run the setup-cuda script to make sure there are no package issues

Linux via Docker

You should be able to navigate into the folder and then pull the repo to update it, then rebuild your Docker image.

cd ai-voice-cloning
git pull
./setup-docker.sh

Documentation

Troubleshooting Manual Installation

The terminal is your friend. Any errors or issues will pop-up in the terminal when you go to try and run, and then you can start debugging from there.

If somewhere in the process, torch gets messed up, you may have to reinstall it. You will have to uninstall it, then reinstall it like the following. Make sure to type (Y) to confirm deletion.

.\venv\Scripts\activate.bat
pip uninstall torch
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Bug Reporting

If you run into any problems, please open up a new issue on the issues tab.

Tips for developers

setup-cuda.bat should have everything that you need for the packages to be installed. All of the different requirements files make it quite a mess in the script, but each repo has their requirements installed, and then at the end, the requirements.txt in the root is needed to change the version back to compatible versions for this repo.

ai-voice-cloning's People

Contributors

Stargazers

Watchers

Forkers

mathieutrudeau rafaelbgf angeloluidens codingrockz manfox177 cgalei r2d2m waywardspooky chameleonhash lemoi18 simcop2387 frierenlabs tymiles003 spikeparaffa angelopaolosantos 2mwebs q5sys graviton144 huba2004 tantock exmailcom dioskurides alienware huaxuanw revmagi mdwoicke hermitengine fabiosfernandes ekakit m1ndb0ts bonryu winpkay ccsourcecode dev5g dependify realcalumplays almakedon octag0no chau9ho99 chucklesb chromyromer eventurika choihyeonrak toannguyen247 randomact5 vantuan12445 fmstrat blane187 shreeshreee jimmyj30 lcsouzamenezes miaohf taoscorpi viscrimson startupagile-win vigneshkaarnik ilg2021 maepopi opdev1004 hyperupscale adheep therealmdwhite reskino ovdik1994 alokdubey01 lobsterchan27 hateart sreesree2004 andrew212223 dppropriate33 jm18499 vducvinh187 squ1ddy gyu-bbb gnurg slashharken greggpatton eibii zhenhaoge ai-jie01 dzjuca tautobet chipulaja

ai-voice-cloning's Issues

IndexError: list index out of range

No matter what settings I try in training, I always get an "IndexError: list index out of range" error.

Detailed log:

E:\ai-voice-cloning>set PYTHONUTF8=1

E:\ai-voice-cloning>runtime\python.exe .\src\main.py
2024-03-09 10:33:30 | INFO | rvc.configs.config | Found GPU NVIDIA GeForce RTX 4080
Whisper detected
Traceback (most recent call last):
  File "E:\ai-voice-cloning\src\utils.py", line 98, in <module>
    from vall_e.emb.qnt import encode as valle_quantize
ModuleNotFoundError: No module named 'vall_e'

Traceback (most recent call last):
  File "E:\ai-voice-cloning\src\utils.py", line 118, in <module>
    import bark
ModuleNotFoundError: No module named 'bark'

[textbox, textbox, radio, textbox, dropdown, audio, number, slider, number, slider, slider, slider, radio, slider, slider, slider, slider, slider, slider, slider, checkboxgroup, checkbox, checkbox]
[dropdown, slider, dropdown, slider, slider, slider, slider, slider]
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Loading TorToiSe... (AR: E:\ai-voice-cloning\models\tortoise\autoregressive.pth, diffusion: ./models/tortoise/diffusion_decoder.pth, vocoder: bigvgan_24khz_100band)
Hardware acceleration found: cuda
use_deepspeed api_debug False
E:\ai-voice-cloning\runtime\lib\site-packages\torch\nn\utils\weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
  warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
Loading tokenizer JSON: ./modules/tortoise-tts/tortoise/data/tokenizer.json
Loaded tokenizer
Loading autoregressive model: E:\ai-voice-cloning\models\tortoise\autoregressive.pth
Loaded autoregressive model
Loaded diffusion model
Loading vocoder model: bigvgan_24khz_100band
Loading vocoder model: bigvgan_24khz_100band.pth
Removing weight norm...
Loaded vocoder model
Loaded TTS, ready for generation.
Unloaded TTS
Spawning process:  train.bat ./training/neeko/train.yaml
[Training] [2024-03-09T10:35:22.642353]
2024-03-09 10:35:22 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:22.645354] E:\ai-voice-cloning>set PYTHONUTF8=1
2024-03-09 10:35:22 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:22.648355]
2024-03-09 10:35:22 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:22.650356] E:\ai-voice-cloning>.\runtime\python.exe .\src\train.py --yaml "./training/neeko/train.yaml"
2024-03-09 10:35:22 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
2024-03-09 10:35:22 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
2024-03-09 10:35:22 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/reset "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:24.427039] [2024-03-09 10:35:24,427] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs.
2024-03-09 10:35:24 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
2024-03-09 10:35:24 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
2024-03-09 10:35:24 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/reset "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.472498] 24-03-09 10:35:26.471 - INFO:   name: neeko
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.475499]   model: extensibletrainer
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.478500]   scale: 1
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.480500]   gpu_ids: [0]
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.484501]   start_step: 0
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.486502]   checkpointing_enabled: True
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.489502]   fp16: False
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.491503]   bitsandbytes: True
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.493503]   gpus: 1
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.496504]   datasets:[
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.498505]     train:[
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.500504]       name: training
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.502505]       n_workers: 2
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.504506]       batch_size: 40
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.507506]       mode: paired_voice_audio
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.509520]       path: ./training/neeko/train.txt
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.512507]       fetcher_mode: ['lj']
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.514507]       phase: train
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.516508]       max_wav_length: 255995
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.518509]       max_text_length: 200
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.520509]       sample_rate: 22050
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.522510]       load_conditioning: True
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.525510]       num_conditioning_candidates: 2
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.527511]       conditioning_length: 44000
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.529511]       use_bpe_tokenizer: True
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.532512]       tokenizer_vocab: ./modules/tortoise-tts/tortoise/data/tokenizer.json
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.534513]       load_aligned_codes: False
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.537513]       data_type: img
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.539526]     ]
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.541514]     val:[
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.543514]       name: validation
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.545515]       n_workers: 2
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.548515]       batch_size: 4
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.550516]       mode: paired_voice_audio
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.553517]       path: ./training/neeko/validation.txt
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.556517]       fetcher_mode: ['lj']
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.558517]       phase: val
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.560518]       max_wav_length: 255995
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.562519]       max_text_length: 200
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.565519]       sample_rate: 22050
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.567519]       load_conditioning: True
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.569533]       num_conditioning_candidates: 2
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.572521]       conditioning_length: 44000
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.574521]       use_bpe_tokenizer: True
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.577522]       tokenizer_vocab: ./modules/tortoise-tts/tortoise/data/tokenizer.json
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.579523]       load_aligned_codes: False
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.581523]       data_type: img
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.583523]     ]
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.585524]   ]
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.587524]   steps:[
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.589546]     gpt_train:[
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.592525]       training: gpt
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.594527]       loss_log_buffer: 500
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.596526]       optimizer: adamw
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.598526]       optimizer_params:[
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.601528]         lr: 1e-05
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.604528]         weight_decay: 0.01
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.606528]         beta1: 0.9
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.608529]         beta2: 0.96
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.610529]       ]
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.612530]       clip_grad_eps: 4
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.615530]       injectors:[
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.617531]         paired_to_mel:[
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.619532]           type: torch_mel_spectrogram
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.622532]           mel_norm_file: ./modules/tortoise-tts/tortoise/data/mel_norms.pth
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.624532]           in: wav
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.626533]           out: paired_mel
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.628533]         ]
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.630534]         paired_cond_to_mel:[
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.632534]           type: for_each
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.635535]           subtype: torch_mel_spectrogram
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.637535]           mel_norm_file: ./modules/tortoise-tts/tortoise/data/mel_norms.pth
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.639535]           in: conditioning
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.642537]           out: paired_conditioning_mel
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.646538]         ]
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.648538]         to_codes:[
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.650538]           type: discrete_token
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.652538]           in: paired_mel
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.655539]           out: paired_mel_codes
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.657540]           dvae_config: ./models/tortoise/train_diffusion_vocoder_22k_level.yml
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.659540]         ]
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.662541]         paired_fwd_text:[
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.664541]           type: generator
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.666542]           generator: gpt
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.668543]           in: ['paired_conditioning_mel', 'padded_text', 'text_lengths', 'paired_mel_codes', 'wav_lengths']
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.671543]           out: ['loss_text_ce', 'loss_mel_ce', 'logits']
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.673556]         ]
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.675544]       ]
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.678545]       losses:[
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.680545]         text_ce:[
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.682545]           type: direct
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.684545]           weight: 0.02
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.686546]           key: loss_text_ce
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.689559]         ]
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.691548]         mel_ce:[
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.693548]           type: direct
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.695548]           weight: 1
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.697549]           key: loss_mel_ce
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.700550]         ]
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.702550]       ]
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.704551]     ]
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.706551]   ]
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.709552]   networks:[
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.711552]     gpt:[
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.713553]       type: generator
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.715553]       which_model_G: unified_voice2
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.718553]       kwargs:[
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.720554]         layers: 30
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.722554]         model_dim: 1024
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.724555]         heads: 16
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.727556]         max_text_tokens: 402
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.729556]         max_mel_tokens: 604
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.731557]         max_conditioning_inputs: 2
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.733557]         mel_length_compression: 1024
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.736558]         number_text_tokens: 256
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.738558]         number_mel_codes: 8194
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.740559]         start_mel_token: 8192
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.743559]         stop_mel_token: 8193
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.745559]         start_text_token: 255
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.747561]         train_solo_embeddings: False
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.750561]         use_mel_codes_as_input: True
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.752561]         checkpointing: True
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.755562]         tortoise_compat: True
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.757562]       ]
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.760563]     ]
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.763564]   ]
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.765564]   path:[
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.767564]     strict_load: True
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.769565]     pretrain_model_gpt: ./models/tortoise/autoregressive.pth
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.772565]     root: ./
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.774566]     experiments_root: ./training\neeko\finetune
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.776566]     models: ./training\neeko\finetune\models
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.778567]     training_state: ./training\neeko\finetune\training_state
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.781567]     log: ./training\neeko\finetune
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.783568]     val_images: ./training\neeko\finetune\val_images
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.785569]   ]
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.787570]   train:[
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.790570]     niter: 1600
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.792570]     warmup_iter: -1
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.794570]     mega_batch_factor: 10
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.796571]     val_freq: 800
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.799572]     ema_enabled: False
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.801572]     default_lr_scheme: CosineAnnealingLR_Restart
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.803573]     T_period: [200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200]
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.806573]     warmup: 0
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.809586]     eta_min: 1e-08
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.811574]     restarts: [200, 400, 600, 800, 1000, 1200, 1400, 1600]
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.813575]     restart_weights: [0.875, 0.75, 0.625, 0.5, 0.375, 0.25, 0.125, 0.0625]
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.815575]   ]
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.818576]   eval:[
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.820576]     pure: False
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.822577]     output_state: gen
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.825578]   ]
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.827578]   logger:[
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.829578]     save_checkpoint_freq: 800
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.832579]     visuals: ['gen', 'mel']
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.834580]     visual_debug_rate: 800
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.836580]     is_mel_spectrogram: True
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.839594]   ]
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.841581]   is_train: True
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.843581]   dist: False
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.845582]
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:26.847583] 24-03-09 10:35:26.472 - INFO: Random seed: 5538
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
2024-03-09 10:35:26 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/reset "HTTP/1.1 200 OK"
2024-03-09 10:35:27 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
2024-03-09 10:35:27 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/reset "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:27.673678] 24-03-09 10:35:27.673 - INFO: Number of training data elements: 144, iters: 4
2024-03-09 10:35:27 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:27.676667] 24-03-09 10:35:27.673 - INFO: Total epochs needed: 400 for iters 1,600
2024-03-09 10:35:27 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
2024-03-09 10:35:27 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
2024-03-09 10:35:27 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/reset "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:28.574874] E:\ai-voice-cloning\runtime\lib\site-packages\transformers\configuration_utils.py:363: UserWarning: Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 Transformers. Using `model.gradient_checkpointing_enable()` instead, or if you are using the `Trainer` API, pass `gradient_checkpointing=True` in your `TrainingArguments`.
2024-03-09 10:35:28 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:28.578875]   warnings.warn(
2024-03-09 10:35:28 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
2024-03-09 10:35:28 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
2024-03-09 10:35:28 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/reset "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:35.076722] 24-03-09 10:35:35.075 - INFO: Loading model for [./models/tortoise/autoregressive.pth]
2024-03-09 10:35:35 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
2024-03-09 10:35:35 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
2024-03-09 10:35:35 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/reset "HTTP/1.1 200 OK"
[Training] [2024-03-09T10:35:35.562853] 24-03-09 10:35:35.556 - INFO: Start training from epoch: 0, iter: 0
[Training] [2024-03-09T10:35:37.342177] [2024-03-09 10:35:37,342] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs.
[Training] [2024-03-09T10:35:39.207932] [2024-03-09 10:35:39,207] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs.
[Training] [2024-03-09T10:35:40.294527] Disabled distributed training.
[Training] [2024-03-09T10:35:40.295527] Path already exists. Rename it to [./training\neeko\finetune_archived_240309-103526]
[Training] [2024-03-09T10:35:40.295527] Loading from ./models/tortoise/dvae.pth
[Training] [2024-03-09T10:35:40.296527] Traceback (most recent call last):
[Training] [2024-03-09T10:35:40.296527]   File "E:\ai-voice-cloning\src\train.py", line 72, in <module>
[Training] [2024-03-09T10:35:40.296527]     train(config_path, args.launcher)
[Training] [2024-03-09T10:35:40.297527]   File "E:\ai-voice-cloning\src\train.py", line 39, in train
[Training] [2024-03-09T10:35:40.297527]     trainer.do_training()
[Training] [2024-03-09T10:35:40.297527]   File "E:\ai-voice-cloning\src\dlas\train.py", line 406, in do_training
[Training] [2024-03-09T10:35:40.298528]     for train_data in tq_ldr:
[Training] [2024-03-09T10:35:40.298528]   File "E:\ai-voice-cloning\runtime\lib\site-packages\torch\utils\data\dataloader.py", line 630, in __next__
[Training] [2024-03-09T10:35:40.298528]     data = self._next_data()
[Training] [2024-03-09T10:35:40.299528]   File "E:\ai-voice-cloning\runtime\lib\site-packages\torch\utils\data\dataloader.py", line 1345, in _next_data
[Training] [2024-03-09T10:35:40.299528]     return self._process_data(data)
[Training] [2024-03-09T10:35:40.299528]   File "E:\ai-voice-cloning\runtime\lib\site-packages\torch\utils\data\dataloader.py", line 1371, in _process_data
[Training] [2024-03-09T10:35:40.300528]     data.reraise()
[Training] [2024-03-09T10:35:40.300528]   File "E:\ai-voice-cloning\runtime\lib\site-packages\torch\_utils.py", line 694, in reraise
[Training] [2024-03-09T10:35:40.300528]     raise exception
[Training] [2024-03-09T10:35:40.300528] IndexError: Caught IndexError in DataLoader worker process 0.
[Training] [2024-03-09T10:35:40.301528] Original Traceback (most recent call last):
[Training] [2024-03-09T10:35:40.301528]   File "E:\ai-voice-cloning\src\dlas\data\audio\paired_voice_audio_dataset.py", line 218, in __getitem__
[Training] [2024-03-09T10:35:40.301528]     tseq, wav, text, path, type = self.get_wav_text_pair(
[Training] [2024-03-09T10:35:40.302528]   File "E:\ai-voice-cloning\src\dlas\data\audio\paired_voice_audio_dataset.py", line 200, in get_wav_text_pair
[Training] [2024-03-09T10:35:40.302528]     audiopath, text, type = audiopath_and_text[0], audiopath_and_text[1], audiopath_and_text[2]
[Training] [2024-03-09T10:35:40.302528] IndexError: list index out of range
[Training] [2024-03-09T10:35:40.303529]
[Training] [2024-03-09T10:35:40.303529] During handling of the above exception, another exception occurred:
[Training] [2024-03-09T10:35:40.303529]
[Training] [2024-03-09T10:35:40.304529] Traceback (most recent call last):
[Training] [2024-03-09T10:35:40.305530]   File "E:\ai-voice-cloning\src\dlas\data\audio\paired_voice_audio_dataset.py", line 218, in __getitem__
[Training] [2024-03-09T10:35:40.306531]     tseq, wav, text, path, type = self.get_wav_text_pair(
[Training] [2024-03-09T10:35:40.306531]   File "E:\ai-voice-cloning\src\dlas\data\audio\paired_voice_audio_dataset.py", line 200, in get_wav_text_pair
[Training] [2024-03-09T10:35:40.307530]     audiopath, text, type = audiopath_and_text[0], audiopath_and_text[1], audiopath_and_text[2]
[Training] [2024-03-09T10:35:40.307530] IndexError: list index out of range
[Training] [2024-03-09T10:35:40.307530]
[Training] [2024-03-09T10:35:40.307530] During handling of the above exception, another exception occurred:
[Training] [2024-03-09T10:35:40.308531]
[Training] [2024-03-09T10:35:40.308531] Traceback (most recent call last):
[Training] [2024-03-09T10:35:40.308531]   File "E:\ai-voice-cloning\src\dlas\data\audio\paired_voice_audio_dataset.py", line 218, in __getitem__
[Training] [2024-03-09T10:35:40.309541]     tseq, wav, text, path, type = self.get_wav_text_pair(
[Training] [2024-03-09T10:35:40.309541]   File "E:\ai-voice-cloning\src\dlas\data\audio\paired_voice_audio_dataset.py", line 200, in get_wav_text_pair
[Training] [2024-03-09T10:35:40.309541]     audiopath, text, type = audiopath_and_text[0], audiopath_and_text[1], audiopath_and_text[2]
[Training] [2024-03-09T10:35:40.310530] IndexError: list index out of range
[Training] [2024-03-09T10:35:40.310530]
[Training] [2024-03-09T10:35:40.310530] During handling of the above exception, another exception occurred:
[Training] [2024-03-09T10:35:40.311531]
[Training] [2024-03-09T10:35:40.311531] Traceback (most recent call last):
[Training] [2024-03-09T10:35:40.311531]   File "E:\ai-voice-cloning\src\dlas\data\audio\paired_voice_audio_dataset.py", line 218, in __getitem__
[Training] [2024-03-09T10:35:40.312531]     tseq, wav, text, path, type = self.get_wav_text_pair(
[Training] [2024-03-09T10:35:40.312531]   File "E:\ai-voice-cloning\src\dlas\data\audio\paired_voice_audio_dataset.py", line 200, in get_wav_text_pair
[Training] [2024-03-09T10:35:40.312531]     audiopath, text, type = audiopath_and_text[0], audiopath_and_text[1], audiopath_and_text[2]
[Training] [2024-03-09T10:35:40.312531] IndexError: list index out of range
[Training] [2024-03-09T10:35:40.313532]
[Training] [2024-03-09T10:35:40.584592] During handling of the above exception, another exception occurred:
[Training] [2024-03-09T10:35:40.585593]
[Training] [2024-03-09T10:35:40.585593] Traceback (most recent call last):
[Training] [2024-03-09T10:35:40.586638]   File "E:\ai-voice-cloning\src\dlas\data\audio\paired_voice_audio_dataset.py", line 218, in __getitem__
[Training] [2024-03-09T10:35:40.586638]     tseq, wav, text, path, type = self.get_wav_text_pair(
[Training] [2024-03-09T10:35:40.586638]   File "E:\ai-voice-cloning\src\dlas\data\audio\paired_voice_audio_dataset.py", line 200, in get_wav_text_pair
[Training] [2024-03-09T10:35:40.587593]     audiopath, text, type = audiopath_and_text[0], audiopath_and_text[1], audiopath_and_text[2]
[Training] [2024-03-09T10:35:40.587593] IndexError: list index out of range
[Training] [2024-03-09T10:35:40.587593]
[Training] [2024-03-09T10:35:40.588593] During handling of the above exception, another exception occurred:
[Training] [2024-03-09T10:35:40.588593]
[Training] [2024-03-09T10:35:40.588593] Traceback (most recent call last):
[Training] [2024-03-09T10:35:40.588593]   File "E:\ai-voice-cloning\src\dlas\data\audio\paired_voice_audio_dataset.py", line 218, in __getitem__
[Training] [2024-03-09T10:35:40.589593]     tseq, wav, text, path, type = self.get_wav_text_pair(
[Training] [2024-03-09T10:35:40.589593]   File "E:\ai-voice-cloning\src\dlas\data\audio\paired_voice_audio_dataset.py", line 200, in get_wav_text_pair
[Training] [2024-03-09T10:35:40.590593]     audiopath, text, type = audiopath_and_text[0], audiopath_and_text[1], audiopath_and_text[2]
[Training] [2024-03-09T10:35:40.590593] IndexError: list index out of range
[Training] [2024-03-09T10:35:40.590593]
[Training] [2024-03-09T10:35:40.591594] During handling of the above exception, another exception occurred:
[Training] [2024-03-09T10:35:40.591594]
[Training] [2024-03-09T10:35:40.591594] Traceback (most recent call last):
[Training] [2024-03-09T10:35:40.591594]   File "E:\ai-voice-cloning\src\dlas\data\audio\paired_voice_audio_dataset.py", line 218, in __getitem__
[Training] [2024-03-09T10:35:40.592594]     tseq, wav, text, path, type = self.get_wav_text_pair(
[Training] [2024-03-09T10:35:40.592594]   File "E:\ai-voice-cloning\src\dlas\data\audio\paired_voice_audio_dataset.py", line 200, in get_wav_text_pair
[Training] [2024-03-09T10:35:40.592594]     audiopath, text, type = audiopath_and_text[0], audiopath_and_text[1], audiopath_and_text[2]
[Training] [2024-03-09T10:35:40.593594] IndexError: list index out of range
[Training] [2024-03-09T10:35:40.593594]
[Training] [2024-03-09T10:35:40.593594] During handling of the above exception, another exception occurred:
[Training] [2024-03-09T10:35:40.594595]
[Training] [2024-03-09T10:35:40.595594] Traceback (most recent call last):
[Training] [2024-03-09T10:35:40.595594]   File "E:\ai-voice-cloning\src\dlas\data\audio\paired_voice_audio_dataset.py", line 218, in __getitem__
[Training] [2024-03-09T10:35:40.596595]     tseq, wav, text, path, type = self.get_wav_text_pair(
[Training] [2024-03-09T10:35:40.596595]   File "E:\ai-voice-cloning\src\dlas\data\audio\paired_voice_audio_dataset.py", line 200, in get_wav_text_pair
[Training] [2024-03-09T10:35:40.596595]     audiopath, text, type = audiopath_and_text[0], audiopath_and_text[1], audiopath_and_text[2]
[Training] [2024-03-09T10:35:40.597596] IndexError: list index out of range
[Training] [2024-03-09T10:35:40.597596]
[Training] [2024-03-09T10:35:40.597596] During handling of the above exception, another exception occurred:
[Training] [2024-03-09T10:35:40.598595]
[Training] [2024-03-09T10:35:40.598595] Traceback (most recent call last):
[Training] [2024-03-09T10:35:40.598595]   File "E:\ai-voice-cloning\src\dlas\data\audio\paired_voice_audio_dataset.py", line 218, in __getitem__
[Training] [2024-03-09T10:35:40.598595]     tseq, wav, text, path, type = self.get_wav_text_pair(
[Training] [2024-03-09T10:35:40.599595]   File "E:\ai-voice-cloning\src\dlas\data\audio\paired_voice_audio_dataset.py", line 200, in get_wav_text_pair
[Training] [2024-03-09T10:35:40.599595]     audiopath, text, type = audiopath_and_text[0], audiopath_and_text[1], audiopath_and_text[2]
[Training] [2024-03-09T10:35:40.599595] IndexError: list index out of range
[Training] [2024-03-09T10:35:40.600595]
[Training] [2024-03-09T10:35:40.600595] During handling of the above exception, another exception occurred:
[Training] [2024-03-09T10:35:40.600595]
[Training] [2024-03-09T10:35:40.601596] Traceback (most recent call last):
[Training] [2024-03-09T10:35:40.601596]   File "E:\ai-voice-cloning\src\dlas\data\audio\paired_voice_audio_dataset.py", line 218, in __getitem__
[Training] [2024-03-09T10:35:40.601596]     tseq, wav, text, path, type = self.get_wav_text_pair(
[Training] [2024-03-09T10:35:40.602597]   File "E:\ai-voice-cloning\src\dlas\data\audio\paired_voice_audio_dataset.py", line 200, in get_wav_text_pair
[Training] [2024-03-09T10:35:40.602597]     audiopath, text, type = audiopath_and_text[0], audiopath_and_text[1], audiopath_and_text[2]
[Training] [2024-03-09T10:35:40.602597] IndexError: list index out of range
[Training] [2024-03-09T10:35:40.603597]
[Training] [2024-03-09T10:35:40.603597] During handling of the above exception, another exception occurred:
[Training] [2024-03-09T10:35:40.603597]
[Training] [2024-03-09T10:35:40.603597] Traceback (most recent call last):
[Training] [2024-03-09T10:35:40.604642]   File "E:\ai-voice-cloning\src\dlas\data\audio\paired_voice_audio_dataset.py", line 218, in __getitem__
[Training] [2024-03-09T10:35:40.605598]     tseq, wav, text, path, type = self.get_wav_text_pair(
[Training] [2024-03-09T10:35:40.605598]   File "E:\ai-voice-cloning\src\dlas\data\audio\paired_voice_audio_dataset.py", line 200, in get_wav_text_pair
[Training] [2024-03-09T10:35:40.606597]     audiopath, text, type = audiopath_and_text[0], audiopath_and_text[1], audiopath_and_text[2]
[Training] [2024-03-09T10:35:40.606597] IndexError: list index out of range
[Training] [2024-03-09T10:35:40.606597]
[Training] [2024-03-09T10:35:40.607643] During handling of the above exception, another exception occurred:
[Training] [2024-03-09T10:35:40.607643]
[Training] [2024-03-09T10:35:40.607643] Traceback (most recent call last):
[Training] [2024-03-09T10:35:40.608598]   File "E:\ai-voice-cloning\src\dlas\data\audio\paired_voice_audio_dataset.py", line 218, in __getitem__
[Training] [2024-03-09T10:35:40.608598]     tseq, wav, text, path, type = self.get_wav_text_pair(
[Training] [2024-03-09T10:35:40.608598]   File "E:\ai-voice-cloning\src\dlas\data\audio\paired_voice_audio_dataset.py", line 200, in get_wav_text_pair
[Training] [2024-03-09T10:35:40.609599]     audiopath, text, type = audiopath_and_text[0], audiopath_and_text[1], audiopath_and_text[2]
[Training] [2024-03-09T10:35:40.609609] IndexError: list index out of range
[Training] [2024-03-09T10:35:40.609609]
[Training] [2024-03-09T10:35:40.610597] During handling of the above exception, another exception occurred:
[Training] [2024-03-09T10:35:40.610597]
[Training] [2024-03-09T10:35:40.610597] Traceback (most recent call last):
[Training] [2024-03-09T10:35:40.611598]   File "E:\ai-voice-cloning\src\dlas\data\audio\paired_voice_audio_dataset.py", line 218, in __getitem__
[Training] [2024-03-09T10:35:40.611598]     tseq, wav, text, path, type = self.get_wav_text_pair(
[Training] [2024-03-09T10:35:40.611598]   File "E:\ai-voice-cloning\src\dlas\data\audio\paired_voice_audio_dataset.py", line 200, in get_wav_text_pair
[Training] [2024-03-09T10:35:40.612599]     audiopath, text, type = audiopath_and_text[0], audiopath_and_text[1], audiopath_and_text[2]
[Training] [2024-03-09T10:35:40.612599] IndexError: list index out of range
[Training] [2024-03-09T10:35:40.612599]
[Training] [2024-03-09T10:35:40.613599] During handling of the above exception, another exception occurred:
[Training] [2024-03-09T10:35:40.613599]
[Training] [2024-03-09T10:35:40.613599] Traceback (most recent call last):
[Training] [2024-03-09T10:35:40.614599]   File "E:\ai-voice-cloning\src\dlas\data\audio\paired_voice_audio_dataset.py", line 218, in __getitem__
[Training] [2024-03-09T10:35:40.615600]     tseq, wav, text, path, type = self.get_wav_text_pair(
[Training] [2024-03-09T10:35:40.615600]   File "E:\ai-voice-cloning\src\dlas\data\audio\paired_voice_audio_dataset.py", line 200, in get_wav_text_pair
[Training] [2024-03-09T10:35:40.616600]     audiopath, text, type = audiopath_and_text[0], audiopath_and_text[1], audiopath_and_text[2]
[Training] [2024-03-09T10:35:40.616600] IndexError: list index out of range
[Training] [2024-03-09T10:35:40.616600]
[Training] [2024-03-09T10:35:40.617599] During handling of the above exception, another exception occurred:
[Training] [2024-03-09T10:35:40.617599]
[Training] [2024-03-09T10:35:40.617599] Traceback (most recent call last):
[Training] [2024-03-09T10:35:40.618599]   File "E:\ai-voice-cloning\src\dlas\data\audio\paired_voice_audio_dataset.py", line 218, in __getitem__
[Training] [2024-03-09T10:35:40.618599]     tseq, wav, text, path, type = self.get_wav_text_pair(
[Training] [2024-03-09T10:35:40.618599]   File "E:\ai-voice-cloning\src\dlas\data\audio\paired_voice_audio_dataset.py", line 200, in get_wav_text_pair
[Training] [2024-03-09T10:35:40.619600]     audiopath, text, type = audiopath_and_text[0], audiopath_and_text[1], audiopath_and_text[2]
[Training] [2024-03-09T10:35:40.619600] IndexError: list index out of range
[Training] [2024-03-09T10:35:40.619600]
[Training] [2024-03-09T10:35:40.620600] During handling of the above exception, another exception occurred:
[Training] [2024-03-09T10:35:40.620600]
[Training] [2024-03-09T10:35:40.620600] Traceback (most recent call last):
[Training] [2024-03-09T10:35:40.621601]   File "E:\ai-voice-cloning\src\dlas\data\audio\paired_voice_audio_dataset.py", line 218, in __getitem__
[Training] [2024-03-09T10:35:40.621601]     tseq, wav, text, path, type = self.get_wav_text_pair(
[Training] [2024-03-09T10:35:40.621601]   File "E:\ai-voice-cloning\src\dlas\data\audio\paired_voice_audio_dataset.py", line 200, in get_wav_text_pair
[Training] [2024-03-09T10:35:40.621601]     audiopath, text, type = audiopath_and_text[0], audiopath_and_text[1], audiopath_and_text[2]
[Training] [2024-03-09T10:35:40.622601] IndexError: list index out of range
[Training] [2024-03-09T10:35:40.622601]
[Training] [2024-03-09T10:35:40.622601] During handling of the above exception, another exception occurred:
[Training] [2024-03-09T10:35:40.623617]
[Training] [2024-03-09T10:35:40.623617] Traceback (most recent call last):
[Training] [2024-03-09T10:35:40.623617]   File "E:\ai-voice-cloning\runtime\lib\site-packages\torch\utils\data\_utils\worker.py", line 308, in _worker_loop
[Training] [2024-03-09T10:35:40.624601]     data = fetcher.fetch(index)
[Training] [2024-03-09T10:35:40.625601]   File "E:\ai-voice-cloning\runtime\lib\site-packages\torch\utils\data\_utils\fetch.py", line 51, in fetch
[Training] [2024-03-09T10:35:40.625601]     data = [self.dataset[idx] for idx in possibly_batched_index]
[Training] [2024-03-09T10:35:40.626601]   File "E:\ai-voice-cloning\runtime\lib\site-packages\torch\utils\data\_utils\fetch.py", line 51, in <listcomp>
[Training] [2024-03-09T10:35:40.626601]     data = [self.dataset[idx] for idx in possibly_batched_index]
[Training] [2024-03-09T10:35:40.626601]   File "E:\ai-voice-cloning\src\dlas\data\audio\paired_voice_audio_dataset.py", line 233, in __getitem__
[Training] [2024-03-09T10:35:40.627602]     return self[(index+1) % len(self)]
[Training] [2024-03-09T10:35:40.627602]   File "E:\ai-voice-cloning\src\dlas\data\audio\paired_voice_audio_dataset.py", line 233, in __getitem__
[Training] [2024-03-09T10:35:40.627602]     return self[(index+1) % len(self)]
[Training] [2024-03-09T10:35:40.628602]   File "E:\ai-voice-cloning\src\dlas\data\audio\paired_voice_audio_dataset.py", line 233, in __getitem__
[Training] [2024-03-09T10:35:40.628602]     return self[(index+1) % len(self)]
[Training] [2024-03-09T10:35:40.628602]   [Previous line repeated 97 more times]
[Training] [2024-03-09T10:35:40.629603]   File "E:\ai-voice-cloning\src\dlas\data\audio\paired_voice_audio_dataset.py", line 218, in __getitem__
[Training] [2024-03-09T10:35:40.629603]     tseq, wav, text, path, type = self.get_wav_text_pair(
[Training] [2024-03-09T10:35:40.629603]   File "E:\ai-voice-cloning\src\dlas\data\audio\paired_voice_audio_dataset.py", line 200, in get_wav_text_pair
[Training] [2024-03-09T10:35:40.630602]     audiopath, text, type = audiopath_and_text[0], audiopath_and_text[1], audiopath_and_text[2]
[Training] [2024-03-09T10:35:40.630602] IndexError: list index out of range
[Training] [2024-03-09T10:35:40.630602]
[Training] [2024-03-09T10:35:51.183741]
[Training] [2024-03-09T10:35:51.183741] E:\ai-voice-cloning>pause

32/64 issue when start.bat

C:\Users\liede\Desktop\Voice` Changer Stuff\RVC Tools\ai-voice-cloning>set PYTHONUTF8=1

C:\Users\liede\Desktop\Voice Changer Stuff\RVC Tools\ai-voice-cloning>runtime\python.exe .\src\main.py
Traceback (most recent call last):
File "C:\Users\liede\Desktop\Voice Changer Stuff\RVC Tools\ai-voice-cloning\src\main.py", line 20, in
from utils import *
File "C:\Users\liede\Desktop\Voice Changer Stuff\RVC Tools\ai-voice-cloning\src\utils.py", line 45, in
from rvc_pipe.rvc_infer import rvc_convert
File "C:\Users\liede\Desktop\Voice Changer Stuff\RVC Tools\ai-voice-cloning\runtime\lib\site-packages\rvc_pipe\rvc_infer.py", line 10, in
from rvc.infer.modules.vc.modules import VC
File "C:\Users\liede\Desktop\Voice Changer Stuff\RVC Tools\ai-voice-cloning\rvc\infer\modules\vc\modules.py", line 20, in
from rvc.infer.modules.vc.utils import *
File "C:\Users\liede\Desktop\Voice Changer Stuff\RVC Tools\ai-voice-cloning\rvc\infer\modules\vc\utils.py", line 3, in
from fairseq import checkpoint_utils
File "C:\Users\liede\Desktop\Voice Changer Stuff\RVC Tools\ai-voice-cloning\runtime\lib\site-packages\fairseq_init_.py", line 40, in
import fairseq.scoring # noqa
File "C:\Users\liede\Desktop\Voice Changer Stuff\RVC Tools\ai-voice-cloning\runtime\lib\site-packages\fairseq\scoring_init_.py", line 55, in
importlib.import_module("fairseq.scoring." + module)
File "importlib_init_.py", line 127, in import_module
File "C:\Users\liede\Desktop\Voice Changer Stuff\RVC Tools\ai-voice-cloning\runtime\lib\site-packages\fairseq\scoring\bleu.py", line 14, in
from fairseq.scoring.tokenizer import EvaluationTokenizer
File "C:\Users\liede\Desktop\Voice Changer Stuff\RVC Tools\ai-voice-cloning\runtime\lib\site-packages\fairseq\scoring\tokenizer.py", line 8, in
import sacrebleu as sb
File "C:\Users\liede\Desktop\Voice Changer Stuff\RVC Tools\ai-voice-cloning\runtime\lib\site-packages\sacrebleu_init_.py", line 21, in
from .utils import smart_open, SACREBLEU_DIR, download_test_set # noqa: F401
File "C:\Users\liede\Desktop\Voice Changer Stuff\RVC Tools\ai-voice-cloning\runtime\lib\site-packages\sacrebleu\utils.py", line 9, in
import portalocker
File "C:\Users\liede\Desktop\Voice Changer Stuff\RVC Tools\ai-voice-cloning\runtime\lib\site-packages\portalocker_init_.py", line 1, in
from . import about, constants, exceptions, portalocker, utils
File "C:\Users\liede\Desktop\Voice Changer Stuff\RVC Tools\ai-voice-cloning\runtime\lib\site-packages\portalocker\portalocker.py", line 15, in
import pywintypes
File "C:\Users\liede\AppData\Roaming\Python\Python39\site-packages\win32\lib\pywintypes.py", line 129, in
import_pywin32_system_module("pywintypes", globals())
File "C:\Users\liede\AppData\Roaming\Python\Python39\site-packages\win32\lib\pywintypes.py", line 49, in import_pywin32_system_module
import _win32sysloader
ImportError: DLL load failed while importing _win32sysloader: %1 is not a valid Win32 application.

C:\Users\liede\Desktop\Voice Changer Stuff\RVC Tools\ai-voice-cloning>pause
Press any key to continue . .

I can't seem to locate the issue with the ImportError

whisper large-v3 support?

Full model

there is a faster whisper large-v3 model too

ai-voice-cloning>pause

[Training] [2024-03-01T21:18:16.405114] warnings.warn(
[Training] [2024-03-01T21:19:46.370136] Disabled distributed training.
[Training] [2024-03-01T21:19:46.370136] Loading from ./models/tortoise/dvae.pth
[Training] [2024-03-01T21:19:46.370136] Traceback (most recent call last):
[Training] [2024-03-01T21:19:46.370136] File "d:\RVC\AI VOICE CLONING\ai-voice-cloning\src\train.py", line 72, in
[Training] [2024-03-01T21:19:46.370136] train(config_path, args.launcher)
[Training] [2024-03-01T21:19:46.370136] File "d:\RVC\AI VOICE CLONING\ai-voice-cloning\src\train.py", line 39, in train
[Training] [2024-03-01T21:19:46.370136] trainer.do_training()
[Training] [2024-03-01T21:19:46.370136] File "d:\RVC\AI VOICE CLONING\ai-voice-cloning\src\dlas\train.py", line 408, in do_training
[Training] [2024-03-01T21:19:46.370136] metric = self.do_step(train_data)
[Training] [2024-03-01T21:19:46.370136] File "d:\RVC\AI VOICE CLONING\ai-voice-cloning\src\dlas\train.py", line 271, in do_step
[Training] [2024-03-01T21:19:46.370136] gradient_norms_dict = self.model.optimize_parameters(
[Training] [2024-03-01T21:19:46.370136] File "d:\RVC\AI VOICE CLONING\ai-voice-cloning\src\dlas\trainer\ExtensibleTrainer.py", line 321, in optimize_parameters
[Training] [2024-03-01T21:19:46.386620] ns = step.do_forward_backward(
[Training] [2024-03-01T21:19:46.386620] File "d:\RVC\AI VOICE CLONING\ai-voice-cloning\src\dlas\trainer\steps.py", line 242, in do_forward_backward
[Training] [2024-03-01T21:19:46.386620] local_state[k] = v[grad_accum_step]
[Training] [2024-03-01T21:19:46.386620] IndexError: list index out of range
[Training] [2024-03-01T21:19:57.203746]
[Training] [2024-03-01T21:19:57.203746] d:\RVC\AI VOICE CLONING\ai-voice-cloning>pause

How many epochs for RVC & Tortoise TTS? + Need out of memory tips

Hi, I'm new to AI training, I was wondering how many epochs I need and what batch size I should set on both Tortoise TTS and RVC (since I have both installed on my secondary NVIDIA laptop; including your repo), depending on the length of the dataset. My shortest model is 1 minute and 11 seconds, my longest one is 43 minutes 57 seconds. I just don't want to risk overtraining them and I really want them to sound accurate.

As for Tortoise, I was going to shorten the models to five or three -10 second segments, like it says on the GitHub page.

I also don't want to risk running into OOM (CUDA out of memory) since my secondary laptop has only 8 gigs of RAM (and possibly 4 gigs of VRAM). I really want to use my secondary laptop for this, since my primary one has AMD and is not supportive with any NVIDIA applications, and the desktop I have runs smoothly with 12 gigs of RAM, but I'd rather be portable.

I've been searching Google & Reddit a lot, only to find several different answers each time, so that didn't help much. Any advice would be very helpful, thanks. :)

eror module No module named 'rvc_pipe'

module error says :

/content/ai-voice-cloning
Traceback (most recent call last):
File "/content/ai-voice-cloning/./src/main.py", line 20, in
from utils import *
File "/content/ai-voice-cloning/src/utils.py", line 45, in
from rvc_pipe.rvc_infer import rvc_convert
ModuleNotFoundError: No module named 'rvc_pipe'

Training throws error: "Something went wrong 'utf-8' codec can't decode byte 0x81 in position 51: invalid start byte"

Hey i started training after preparing my dataset via the ui but it throws only errors. Ill attach my console output

console.txt

Error message

I trained a model and am getting the following error message when trying to generate TTS : "Possible latent mismatch: click the "(Re)Compute Voice Latents" button and then try again. Error: 'tuple' object has no attribute 'device'" . i have clicked the (Re)Compute voice latents button but the error keeps on repeating and i cant seem to generate. The generation works with the "random" voice when i switch back to the original autoregressive model in the settings but it doesnt work with the new voice. Any help would be appreciated

Question: Capability of using RVC models

Is there a way of implementing those .PTH models in this?

Include pyfastmp3decoder in the site-packages

Hello,

would be great to add the pyfastmp3decoder used by the project to handle mp3 files natively !

Right now, it crashes when trained with mp3 files

thanks for your work,

Fine-tuning Hifigan Decoder

I heard Fine-tuning the Hifigan Decoder can improve the quality of the model especially in pronunciation.
Is there a recipe to fine-tune it?

Improved generation of non-English texts

Hi there! Hi Jarod!

I'm a beginner in voice generation and I'm trying to train a model using the Slavic language. Could you give any advice on improving the quality of the training dataset?

For the first time, I started by training a model using 20 minutes of voice. I used base Whisper Model and openai/whisper as Whisper Backends. I started Transcribe and Process with Slice Segments enabled. Then I neglected to check the dataset and the resulting generation did not differ much from that given by the standard autoregressive.pth, but it had strange sounds at the end, resembling sighs and other strange sounds.

For the second time, I have already watched all your YouTube videos regarding voice generation and started following this algorithm:

I used 2 hours of the original voice. I used large-v2 Whisper Mode and openai/whisper as Whisper Backends. I started Transcribe and Process with Slice Segments enabled.
Listened to the audio, corrected the start and end of audio files in whisper.json (Quite often, when cutting segments, sounds from another segment fall in, or some pairs of words at the junction of segments sound almost like one word and it is difficult to separate them even manually. Sometimes you can hear sounds on the edges of the road, inhaling)
Corrected the text in the files train.txt , validation.txt , whisper.json so that it matches the text in the audio files. (In some cases, it incorrectly recognizes similar-sounding words or incorrectly recognizes/skips words at the beginning or end of a segment)
Also, for reliability, I replace all numeric values with text and characters such as "%" are also replaced with text
In the settings, I specified the Autoregressive Model that I received for the first time (Most likely it was a big mistake. But then I thought that I would cover the shortcomings of that model with a lot of correct data at further stages of training) and started training using the settings given below:

I trained the model for a week, until I stopped updating the progress on the charts in the web ui. I stopped the training because I was tired of waiting, and the web ui said that I needed to train the model for two more weeks, that is, I completed the training by only 33%. I tried to generate data on the intermediate model that turned out. Yes, his voice sounded better, but he still spoke with a big English accent. But the strange sounds at the end were also present, maybe even more than before.

For the third time, I decided to start learning from the very beginning, using autoregressive.pth, but in order not to wait long, I took only 30 minutes of voice, which learned quite quickly, in about half a day, but as I understood from the description of the library, this kind of graph could not lead to a good result (los_text went lower than los_mel):

As a result, I got the scariest generation I've ever heard! Strange sounds could make up more than half of the audio file (sometimes they looked like duplicates of words that I generated, and sometimes as random English parts of words)

Now I'm looking for a way to get rid of manual text editing and the beginning and ending of segments in whisper.json. I tried using large-v3 Whisper Mode for transcription and, as for me, the results were worse than when using large-v2.
When using openai/whisper as Whisper Backends, I still get quite a lot of inaccuracies at the start and end of the segment, but this is the first time I've seen something similar to hallucinations! So at the end of one segment there was the beginning of a word from the next segment, but several words were added to the recognized text that did not match the text from the next segment (except for the first word, it matched). Although the next segment was recognized correctly (except that the audio was cut off at the beginning). I also found that one of the segments is missing the conjunction "and".

When using m-bain/whisperx as Whisper Backends, I also found that one of the segments lacks the conjunction "and", as in openai/whisper (in the same place). There was also a segment in which there were two words, but there was a big chunk of silence between them and ended up in whisper.json was only the first word of the two.

I was also surprised when I saw that openai/whisper and m-bain/whisperx form whisper.json in a different format.

All of the above made me even more confused. I have a couple of questions hanging right now that I couldn't solve on my own.:

Is there any way to speed up the learning process of the model? Do I understand correctly that this directly depends only on the amount of VRAM, or is there something else that can speed up learning (without compromising quality)?
Which Whisper Backends is better to use to get a better model? So far, it looks like whisperx is cooler, because every word is marked up there (The only question is how correct that markup is, because it can also be inaccurate. As you can see from the screenshot above, this is the case. It shows that one word stretched for more than 4 seconds)
If I use m-bain/whisperx, how do I insert the missing words correctly and what should I write in the "score" parameter (as I understood it, how confident is whisperx in the correctness of recognition)? 1? And does it even need to be specified?
If I use openai/whisper, do I also need to edit "tokens" when I change the text that is specified in this segment? And how is this even done?
Is it possible to use the built-in tokenizer at all.json or do I need to search for/create a tokenizer for the Slavic language?

My PC Settings:
Windows 10 pro
Ryzen 5 3600
Nvidia RTX 3060 Ti
32 RAM

P.S. Thank you for your time!

Unsupported audio format provided: -100 when using Voice Microphone input

Open up the interface, set a prompt, Emotion=None, Voice=Microphone.
Record a sample from microphone.
The sample is recorded succesfully as the playback works as expected.

Click Generate -> Generation halts with an error: Something went wrong Unsupported audio format provided: -100

Loading voice: microphone with model d1f79232
Traceback (most recent call last):
  File "D:\TortoiseTTS\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\gradio\routes.py", line 394, in run_predict
    output = await app.get_blocks().process_api(
  File "D:\TortoiseTTS\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\gradio\blocks.py", line 1075, in process_api
    result = await self.call_function(
  File "D:\TortoiseTTS\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\gradio\blocks.py", line 884, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "D:\TortoiseTTS\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "D:\TortoiseTTS\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\anyio\_backends\_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "D:\TortoiseTTS\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\anyio\_backends\_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "D:\TortoiseTTS\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\gradio\helpers.py", line 587, in tracked_fn
    response = fn(*args)
  File "D:\TortoiseTTS\ai-voice-cloning-v2_0\ai-voice-cloning\src\webui.py", line 129, in generate_proxy
    raise e
  File "D:\TortoiseTTS\ai-voice-cloning-v2_0\ai-voice-cloning\src\webui.py", line 123, in generate_proxy
    sample, outputs, stats = generate(**kwargs)
  File "D:\TortoiseTTS\ai-voice-cloning-v2_0\ai-voice-cloning\src\utils.py", line 364, in generate
    return generate_tortoise(**kwargs)
  File "D:\TortoiseTTS\ai-voice-cloning-v2_0\ai-voice-cloning\src\utils.py", line 1229, in generate_tortoise
    settings = get_settings( override=override )
  File "D:\TortoiseTTS\ai-voice-cloning-v2_0\ai-voice-cloning\src\utils.py", line 1086, in get_settings
    settings['voice_samples'], settings['conditioning_latents'], _ = fetch_voice(voice=selected_voice)
  File "D:\TortoiseTTS\ai-voice-cloning-v2_0\ai-voice-cloning\src\utils.py", line 1011, in fetch_voice
    voice_samples, conditioning_latents = [load_audio(parameters['mic_audio'], tts.input_sample_rate)], None
  File "D:\TortoiseTTS\ai-voice-cloning-v2_0\ai-voice-cloning\src\tortoise\utils\audio.py", line 32, in load_audio
    assert False, f"Unsupported audio format provided: {audiopath[-4:]}"
AssertionError: Unsupported audio format provided: -100
2024-02-06 12:41:25 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 500 Internal Server Error"

Please update DeepSpeed to the latest version 0.13.1

Thanks for the very very greate work.

Update DeepSpeed to the latest version using the build procedure in here, and if possible update to the latest torch 12.1, and python 10.

https://github.com/S95Sedan/Deepspeed-Windows

start.bat issue

Traceback (most recent call last):
File "C:\Users\USUARIO\Documents\Proyectos\AIAUDIO\ai-voice-cloning\src[main.py](https://main.py/)", line 20, in
from utils import *
File "C:\Users\USUARIO\Documents\Proyectos\AIAUDIO\ai-voice-cloning\src[utils.py](https://utils.py/)", line 40, in
from tortoise.api import TextToSpeech as TorToise_TTS, MODELS, get_model_path, pad_or_truncate
File "C:\Users\USUARIO\Documents\Proyectos\AIAUDIO\ai-voice-cloning\modules\tortoise-tts\tortoise[api.py](https://api.py/)", line 21, in
from tortoise.models.clvp import CLVP
File "C:\Users\USUARIO\Documents\Proyectos\AIAUDIO\ai-voice-cloning\modules\tortoise-tts\tortoise\models[clvp.py](https://clvp.py/)", line 7, in
from tortoise.models.transformer import Transformer
File "C:\Users\USUARIO\Documents\Proyectos\AIAUDIO\ai-voice-cloning\modules\tortoise-tts\tortoise\models[transformer.py](https://transformer.py/)", line 6, in
from rotary_embedding_torch import RotaryEmbedding
File "C:\Users\USUARIO\Documents\Proyectos\AIAUDIO\ai-voice-cloning\venv\lib\site-packages\rotary_embedding_torch_init_.py", line 1, in
from rotary_embedding_torch.rotary_embedding_torch import apply_rotary_emb, RotaryEmbedding, broadcat, apply_learned_rotations
File "C:\Users\USUARIO\Documents\Proyectos\AIAUDIO\ai-voice-cloning\venv\lib\site-packages\rotary_embedding_torch\rotary_embedding_torch.py", line 10, in
from beartype import beartype
File "C:\Users\USUARIO\Documents\Proyectos\AIAUDIO\ai-voice-cloning\venv\lib\site-packages\beartype_init_.py", line 57, in
from beartype._decor.decormain import (
File "C:\Users\USUARIO\Documents\Proyectos\AIAUDIO\ai-voice-cloning\venv\lib\site-packages\beartype_decor[decormain.py](https://decormain.py/)", line 23, in
from beartype._conf.confcls import (
File "C:\Users\USUARIO\Documents\Proyectos\AIAUDIO\ai-voice-cloning\venv\lib\site-packages\beartype_conf[confcls.py](https://confcls.py/)", line 25, in
from beartype._cave._cavemap import NoneTypeOr
File "C:\Users\USUARIO\Documents\Proyectos\AIAUDIO\ai-voice-cloning\venv\lib\site-packages\beartype_cave_cavemap.py", line 33, in
from beartype._util.hint.nonpep.utilnonpeptest import (
File "C:\Users\USUARIO\Documents\Proyectos\AIAUDIO\ai-voice-cloning\venv\lib\site-packages\beartype_util\hint\nonpep[utilnonpeptest.py](https://utilnonpeptest.py/)", line 22, in
from beartype._util.cache.utilcachecall import callable_cached
File "C:\Users\USUARIO\Documents\Proyectos\AIAUDIO\ai-voice-cloning\venv\lib\site-packages\beartype_util\cache[utilcachecall.py](https://utilcachecall.py/)", line 32, in
from beartype._util.func.arg.utilfuncargtest import (
File "C:\Users\USUARIO\Documents\Proyectos\AIAUDIO\ai-voice-cloning\venv\lib\site-packages\beartype_util\func\arg[utilfuncargtest.py](https://utilfuncargtest.py/)", line 16, in
from beartype._util.func.arg.utilfuncargiter import (
File "C:\Users\USUARIO\Documents\Proyectos\AIAUDIO\ai-voice-cloning\venv\lib\site-packages\beartype_util\func\arg[utilfuncargiter.py](https://utilfuncargiter.py/)", line 22, in
from beartype._data.hint.datahinttyping import (
File "C:\Users\USUARIO\Documents\Proyectos\AIAUDIO\ai-voice-cloning\venv\lib\site-packages\beartype_data\hint[datahinttyping.py](https://datahinttyping.py/)", line 142, in
BeartypeReturn = Union[BeartypeableT, BeartypeConfedDecorator]
File "C:\Users\USUARIO\AppData\Local\Programs\Python\Python39\lib[typing.py](https://typing.py/)", line 243, in inner
return func(*args, **kwds)
File "C:\Users\USUARIO\AppData\Local\Programs\Python\Python39\lib[typing.py](https://typing.py/)", line 316, in getitem
return self._getitem(self, parameters)
File "C:\Users\USUARIO\AppData\Local\Programs\Python\Python39\lib[typing.py](https://typing.py/)", line 421, in Union
parameters = _remove_dups_flatten(parameters)
File "C:\Users\USUARIO\AppData\Local\Programs\Python\Python39\lib[typing.py](https://typing.py/)", line 215, in _remove_dups_flatten
all_params = set(params)
TypeError: unhashable type: 'list'

RuntimeError: Error(s) in loading state_dict for UnifiedVoice:

Checking CUDA availability and Torch installation...
CUDA available: True
Torch version: 2.1.2+cu118
Running main.py...
Whisper detected
Traceback (most recent call last):
File "C:\Users\aiwinsor\Documents\dev\ai-voice-cloning\src\utils.py", line 98, in
from vall_e.emb.qnt import encode as valle_quantize
ModuleNotFoundError: No module named 'vall_e'

Bark detected
Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch().
Loading TorToiSe... (AR: C:\Users\aiwinsor\Documents\dev\ai-voice-cloning\models\tortoise\autoregressive.pth, diffusion: ./models/tortoise/diffusion_decoder.pth, vocoder: bigvgan_24khz_100band)
Hardware acceleration found: cuda
use_deepspeed api_debug False
Some weights of the model checkpoint at jbetker/wav2vec2-large-robust-ft-libritts-voxpopuli were not used when initializing Wav2Vec2ForCTC: ['wav2vec2.encoder.pos_conv_embed.conv.weight_v', 'wav2vec2.encoder.pos_conv_embed.conv.weight_g']

This IS expected if you are initializing Wav2Vec2ForCTC from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing Wav2Vec2ForCTC from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of Wav2Vec2ForCTC were not initialized from the model checkpoint at jbetker/wav2vec2-large-robust-ft-libritts-voxpopuli and are newly initialized: ['wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original1', 'wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original0']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Loading tokenizer JSON: ./modules/tortoise-tts/tortoise/data/tokenizer.json
Loaded tokenizer
Loading autoregressive model: C:\Users\aiwinsor\Documents\dev\ai-voice-cloning\models\tortoise\autoregressive.pth
Traceback (most recent call last):
File "C:\Users\aiwinsor\Documents\dev\ai-voice-cloning\src\main.py", line 34, in
tts = load_tts()
File "C:\Users\aiwinsor\Documents\dev\ai-voice-cloning\src\utils.py", line 3692, in load_tts
tts = TorToise_TTS(minor_optimizations=not args.low_vram,
File "C:\Users\aiwinsor\Documents\dev\ai-voice-cloning\src\tortoise\api.py", line 308, in init
self.load_autoregressive_model(autoregressive_model_path)
File "C:\Users\aiwinsor\Documents\dev\ai-voice-cloning\src\tortoise\api.py", line 391, in load_autoregressive_model
self.autoregressive.load_state_dict(torch.load(self.autoregressive_model_path))
File "C:\Users\aiwinsor\Documents\dev\ai-voice-cloning\runtime\lib\site-packages\torch\nn\modules\module.py", line 2152, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for UnifiedVoice:
Unexpected key(s) in state_dict: "gpt.h.0.attn.bias", "gpt.h.0.attn.masked_bias", "gpt.h.1.attn.bias", "gpt.h.1.attn.masked_bias", "gpt.h.2.attn.bias", "gpt.h.2.attn.masked_bias", "gpt.h.3.attn.bias", "gpt.h.3.attn.masked_bias", "gpt.h.4.attn.bias", "gpt.h.4.attn.masked_bias", "gpt.h.5.attn.bias", "gpt.h.5.attn.masked_bias", "gpt.h.6.attn.bias", "gpt.h.6.attn.masked_bias", "gpt.h.7.attn.bias", "gpt.h.7.attn.masked_bias", "gpt.h.8.attn.bias", "gpt.h.8.attn.masked_bias", "gpt.h.9.attn.bias", "gpt.h.9.attn.masked_bias", "gpt.h.10.attn.bias", "gpt.h.10.attn.masked_bias", "gpt.h.11.attn.bias", "gpt.h.11.attn.masked_bias", "gpt.h.12.attn.bias", "gpt.h.12.attn.masked_bias", "gpt.h.13.attn.bias", "gpt.h.13.attn.masked_bias", "gpt.h.14.attn.bias", "gpt.h.14.attn.masked_bias", "gpt.h.15.attn.bias", "gpt.h.15.attn.masked_bias", "gpt.h.16.attn.bias", "gpt.h.16.attn.masked_bias", "gpt.h.17.attn.bias", "gpt.h.17.attn.masked_bias", "gpt.h.18.attn.bias", "gpt.h.18.attn.masked_bias", "gpt.h.19.attn.bias", "gpt.h.19.attn.masked_bias", "gpt.h.20.attn.bias", "gpt.h.20.attn.masked_bias", "gpt.h.21.attn.bias", "gpt.h.21.attn.masked_bias", "gpt.h.22.attn.bias", "gpt.h.22.attn.masked_bias", "gpt.h.23.attn.bias", "gpt.h.23.attn.masked_bias", "gpt.h.24.attn.bias", "gpt.h.24.attn.masked_bias", "gpt.h.25.attn.bias", "gpt.h.25.attn.masked_bias", "gpt.h.26.attn.bias", "gpt.h.26.attn.masked_bias", "gpt.h.27.attn.bias", "gpt.h.27.attn.masked_bias", "gpt.h.28.attn.bias", "gpt.h.28.attn.masked_bias", "gpt.h.29.attn.bias", "gpt.h.29.attn.masked_bias".
Press any key to continue . . .

Correct way to use the pretrained models without the API.

Hello, I'm developing a script to directly load models into a TTS system, rather than utilizing an API server for this purpose. I'm interested in finding out whether this method is appropriate.

import torch
import sys 
sys.path.append('./src')
from tortoise.api import TextToSpeech as TorToise_TTS, MODELS, get_model_path, pad_or_truncate

autoregressive_model_path = './training/tina/finetune/models/4020_gpt.pth'
diffusion_model_path = './models/tortoise/diffusion_decoder.pth'
vocoder_model_path = 'bigvgan_24khz_100band.pth'  # Ensure this path is correct
tokenizer_json_path = './modules/tortoise-tts/tortoise/data/tokenizer.json'

tts = TorToise_TTS(
    autoregressive_model_path=autoregressive_model_path,
    diffusion_model_path=diffusion_model_path,
    vocoder_model=vocoder_model_path,
    tokenizer_json=tokenizer_json_path
)

input_text = "Hello, this is a test of the text-to-speech system."
audio_output = tts.tts(text=input_text,
                       num_autoregressive_samples=16,
                       temperature=0.2,
                       length_penalty=1,
                       repetition_penalty=2.0,
                       top_p=0.8,
                       max_mel_tokens=500,
                       cond_free=True,
                       cond_free_k=2,
                       diffusion_temperature=1.0,
                       diffusion_sampler="DDIM",
                       half_p=False)
if audio_output.ndim == 3:
    audio_output = audio_output.squeeze(0)

import torchaudio
torchaudio.save("output_audio.wav", audio_output.cpu(), sample_rate=24000)  # Adjust the sample rate if necessary

I had to add strict=False
in self.autoregressive.load_state_dict(torch.load(self.autoregressive_model_path), strict=False) src\tortoise\api.py

due to the error was giving (shown below)

Loading autoregressive model: ./training/me/finetune/models/4020_gpt.pth
Traceback (most recent call last):
  File "D:\Personal\Workspace\ai-voice-cloning\hg.py", line 21, in <module>
    tts = TorToise_TTS(
  File "D:\Personal\Workspace\ai-voice-cloning\./modules/tortoise-tts\tortoise\api.py", line 308, in __init__
    self.load_autoregressive_model(autoregressive_model_path)
  File "D:\Personal\Workspace\ai-voice-cloning\./modules/tortoise-tts\tortoise\api.py", line 391, in load_autoregressive_model
    self.autoregressive.load_state_dict(torch.load(self.autoregressive_model_path))
  File "C:\Users\administrator\anaconda3\envs\voice\lib\site-packages\torch\nn\modules\module.py", line 2153, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for UnifiedVoice:
        Unexpected key(s) in state_dict: "gpt.h.0.attn.bias", "gpt.h.0.attn.masked_bias", "gpt.h.1.attn.bias", "gpt.h.1.attn.masked_bias", "gpt.h.2.attn.bias", "gpt.h.2.attn.masked_bias", "gpt.h.3.attn.bias", "gpt.h.3.attn.masked_bias", "gpt.h.4.attn.bias", "gpt.h.4.attn.masked_bias", "gpt.h.5.attn.bias", "gpt.h.5.attn.masked_bias", "gpt.h.6.attn.bias", "gpt.h.6.attn.masked_bias", "gpt.h.7.attn.bias", "gpt.h.7.attn.masked_bias", "gpt.h.8.attn.bias", "gpt.h.8.attn.masked_bias", "gpt.h.9.attn.bias", "gpt.h.9.attn.masked_bias", "gpt.h.10.attn.bias", "gpt.h.10.attn.masked_bias", "gpt.h.11.attn.bias", "gpt.h.11.attn.masked_bias", "gpt.h.12.attn.bias", "gpt.h.12.attn.masked_bias", "gpt.h.13.attn.bias", "gpt.h.13.attn.masked_bias", "gpt.h.14.attn.bias", "gpt.h.14.attn.masked_bias", "gpt.h.15.attn.bias", "gpt.h.15.attn.masked_bias", "gpt.h.16.attn.bias", "gpt.h.16.attn.masked_bias", "gpt.h.17.attn.bias", "gpt.h.17.attn.masked_bias", "gpt.h.18.attn.bias", "gpt.h.18.attn.masked_bias", "gpt.h.19.attn.bias", "gpt.h.19.attn.masked_bias", "gpt.h.20.attn.bias", "gpt.h.20.attn.masked_bias", "gpt.h.21.attn.bias", "gpt.h.21.attn.masked_bias", "gpt.h.22.attn.bias", "gpt.h.22.attn.masked_bias", "gpt.h.23.attn.bias", "gpt.h.23.attn.masked_bias", "gpt.h.24.attn.bias", "gpt.h.24.attn.masked_bias", "gpt.h.25.attn.bias", "gpt.h.25.attn.masked_bias", "gpt.h.26.attn.bias", "gpt.h.26.attn.masked_bias", "gpt.h.27.attn.bias", "gpt.h.27.attn.masked_bias", "gpt.h.28.attn.bias", "gpt.h.28.attn.masked_bias", "gpt.h.29.attn.bias", "gpt.h.29.attn.masked_bias".
(voice) PS D:\Personal\Workspace\ai-voice-cloning>

Is there any way to do it without doing strict=False

Dataset preparation not working due to "file not found" error.

Hi there.

When I am trying to prepare my training dataset, I'm facing an error after downloading the Whisper model...

Failed to transcribe: ./voices///speakeraudiosample.wav [WinError 2] File not found
Missing dataset: ./training///whisper.json

Maybe it is caused by the double-slash-char in the path name.

Start.bat - ModuleNotFoundError: No module named 'torchaudio'

https://youtu.be/p31Ax_A5VKA?si=IzAZN9iog-ylIrAh&t=154
at this point, I get this...
(windows10, RTX3060)

C:\ai-voice-cloning>set PYTHONUTF8=1

C:\ai-voice-cloning>python.exe .\src\main.py
Traceback (most recent call last):
File "C:\ai-voice-cloning\src\main.py", line 20, in
from utils import *
File "C:\ai-voice-cloning\src\utils.py", line 29, in
import torchaudio
ModuleNotFoundError: No module named 'torchaudio'

C:\ai-voice-cloning>pause
Press any key to continue . . .

Question: Running on Python env

Hello! Appreciate your great work!
I was wondering if I can run tortoise inside an anaconda python environment instead? If so, how?
Thank you!

[Training] [2024-02-19T23:34:13.936984] [2024-02-19 23:34:13,936] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs.

I did all as described in the video, https://www.youtube.com/watch?v=p31Ax_A5VKA but when I push "Train" button", notjing happens, it just shows me this:

[Training] [2024-02-19T23:34:13.936984] [2024-02-19 23:34:13,936] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs.

And some code before in console.

Also, I got these on startup:

Traceback (most recent call last):
File "H:\AI\ai-voice-cloning-v2_0\ai-voice-cloning\src\utils.py", line 98, in
from vall_e.emb.qnt import encode as valle_quantize
ModuleNotFoundError: No module named 'vall_e'

Traceback (most recent call last):
File "H:\AI\ai-voice-cloning-v2_0\ai-voice-cloning\src\utils.py", line 118, in
import bark
ModuleNotFoundError: No module named 'bark'

Installation for ubuntu?

It seems like this fork added HifiGAN instead of Diffusion Model and some RVC modification. I hope to use it with ubuntu, if not I would like to make an PR. Is there any plan to do so?

Validation loss not showing and the plot takes it as a training loss

The spike in this graph is actually validation losses.

just add
data["mode"] = mode
in the src/utils.py -> load_statistics

and it's fixed

Iteration/Sample setting does not affect output quality when Hifigan is ON

It appears that when the Hifigan is turned ON, increasing the "diffusion iteration" and the "number of autoregressive samples" does not have any noticeable effect on the quality and generating time of the output.

I compared the output quality at different samples / iterations with the Hifigan both ON and OFF, using the same model and voice samples,
(libraries are those preinstalled in the 7zip)

With Hifigan OFF:
iteration=2 sample=2, the output is garbled, cannot hear any clear words, iteration time= 18 secs
iteration= 512 sample= 512 the output has very good quality, iteration time = very very long

With Hifigan ON:
iteration=2 sample=2, the output is acceptable, iteration time=2.96secs
iteration=512 sample=512 , the output is acceptable without noticeable improvement, iteration time is also unchanged =2.96secs

Besides, the output with HifiGan ON sounds choppy.
https://github.com/JarodMica/ai-voice-cloning/assets/51301116/ec3e4ca8-2359-4a9b-a5b3-1be1a05af10f

Below is the HifiGan OFF output.
https://github.com/JarodMica/ai-voice-cloning/assets/51301116/d08612e8-3291-4d26-bab6-d038bf11233b

may i know is this normal or just me?

Thank you very much for your time.

'NoneType' object has no attribute 'dtype'

I keep getting this issue every time i enable and use trvc on longer videos,
or can we only use rvc on short videos in tortoise tts because it works just fine with the audiobook maker.

'NoneType' object has no attribute 'dtype'

How does this compare to 11Labs? I don't see any previews or content guides

Question/issue/concern in title

Unknown method

Whenever i unzip I get these errors
ai-voice-cloning.7z: Unknown method in ai-voice-cloning\runtime\Lib\site-packages\distlib\t64-arm.exe
ai-voice-cloning.7z: Unknown method in ai-voice-cloning\runtime\Lib\site-packages\distlib\w64-arm.exe
ai-voice-cloning.7z: Unknown method in ai-voice-cloning\runtime\Lib\site-packages\installer_scripts\t64-arm.exe
ai-voice-cloning.7z: Unknown method in ai-voice-cloning\runtime\Lib\site-packages\installer_scripts\w64-arm.exe
ai-voice-cloning.7z: Unknown method in ai-voice-cloning\runtime\Lib\site-packages\pip_vendor\distlib\t64-arm.exe
ai-voice-cloning.7z: Unknown method in ai-voice-cloning\runtime\Lib\site-packages\pip_vendor\distlib\w64-arm.exe
ai-voice-cloning.7z: Unknown method in ai-voice-cloning\runtime\Lib\site-packages\setuptools\cli-arm64.exe
ai-voice-cloning.7z: Unknown method in ai-voice-cloning\runtime\Lib\site-packages\setuptools\gui-arm64.exe

Language change

Hi,

Is there any chance to add "language change" for generating text2voice?
Or is there any file where I can change it for now?
Thanks in advance!

Hanging with longer prompts

I know this probably isn't really being maintained anymore but thought I'd give it a shot and ask since I haven't seen a similar issue on here or the original repo.

I can't seem to get it to generate more than one line without hanging after it finishes the first line. I'll put in a long prompt and it will break it up into separate lines and go through the first one just fine, but afterwards it just seems to get stuck on "Generating autoregressive samples - 0.0%" forever - there's no logs that I can tell to indicate any issues, the last logs are just the "Generating line: ..." and then what appears to be some JSON containing the parameters like temperature/etc. I'm using a trained model.

I'm pretty new to this sort of thing, any help is appreciated

huggingface download ai-voice-cloning 7zip integrity not good

I know this isnt really the place to post but do not see any place in huggyface to report a problem.

The current ai-voice-cloning / 7zip file never downloads to where i get a good integrity check once it has been downloaded.
I always run an integrity check on huge files like this before i even try and extract
If i go ahead and do an extract it does of course complete but with errors. Have tried 5 times now.

Generate audio error

Traceback (most recent call last):
File "D:\Work\Voice_Changer_OpenSorce\ai-voice-cloning\venv\lib\site-packages\gradio\routes.py", line 394, in run_predict
output = await app.get_blocks().process_api(
File "D:\Work\Voice_Changer_OpenSorce\ai-voice-cloning\venv\lib\site-packages\gradio\blocks.py", line 1075, in process_api
result = await self.call_function(
File "D:\Work\Voice_Changer_OpenSorce\ai-voice-cloning\venv\lib\site-packages\gradio\blocks.py", line 884, in call_function
prediction = await anyio.to_thread.run_sync(
File "D:\Work\Voice_Changer_OpenSorce\ai-voice-cloning\venv\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "D:\Work\Voice_Changer_OpenSorce\ai-voice-cloning\venv\lib\site-packages\anyio_backends_asyncio.py", line 2134, in run_sync_in_worker_thread
return await future
File "D:\Work\Voice_Changer_OpenSorce\ai-voice-cloning\venv\lib\site-packages\anyio_backends_asyncio.py", line 851, in run
result = context.run(func, *args)
File "D:\Work\Voice_Changer_OpenSorce\ai-voice-cloning\venv\lib\site-packages\gradio\helpers.py", line 587, in tracked_fn
response = fn(*args)
File "D:\Work\Voice_Changer_OpenSorce\ai-voice-cloning\src\webui.py", line 129, in generate_proxy
raise e
File "D:\Work\Voice_Changer_OpenSorce\ai-voice-cloning\src\webui.py", line 123, in generate_proxy
sample, outputs, stats = generate(**kwargs)
File "D:\Work\Voice_Changer_OpenSorce\ai-voice-cloning\src\utils.py", line 364, in generate
return generate_tortoise(**kwargs)
File "D:\Work\Voice_Changer_OpenSorce\ai-voice-cloning\src\utils.py", line 1229, in generate_tortoise
settings = get_settings( override=override )
File "D:\Work\Voice_Changer_OpenSorce\ai-voice-cloning\src\utils.py", line 1086, in get_settings
settings['voice_samples'], settings['conditioning_latents'], _ = fetch_voice(voice=selected_voice)
File "D:\Work\Voice_Changer_OpenSorce\ai-voice-cloning\src\utils.py", line 1013, in fetch_voice
voice_samples, conditioning_latents = None, tts.get_random_conditioning_latents()
File "D:\Work\Voice_Changer_OpenSorce\ai-voice-cloning\modules\tortoise-tts\tortoise\api.py", line 599, in get_random_conditioning_latents
return self.rlg_auto(torch.tensor([0.0])), self.rlg_diffusion(torch.tensor([0.0]))
TypeError: 'NoneType' object is not callable

Training my model isn't working

When trying to train my AI voice, I finish the first two steps of the process. When I get to the Run Training portion it freezes and never gets to the graph part.

dist: False
24-01-19 12:00:52.232 - INFO: Random seed: 7040
24-01-19 12:00:54.259 - INFO: Number of training data elements: 36, iters: 1
24-01-19 12:00:54.259 - INFO: Total epochs needed: 500 for iters 500
C:\Users\sirpo\Downloads\ai-voice-cloning\runtime\lib\site-packages\transformers\configuration_utils.py:363: UserWarning: Passing gradient_checkpointing to a config initialization is deprecated and will be removed in v5 Transformers. Using model.gradient_checkpointing_enable() instead, or if you are using the TrainerAPI, passgradient_checkpointing-Truein yourTrainingArguments`.

After this text is shown nothing else happens. I tried reconnecting or stopping the training but that ends up breaking the site making everything infinitely load and forcing me to close the tab and command prompt and start over.

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 3: invalid start byte

getting this error :
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 3: invalid start byte

i'm trying to work it out with a foreign language (hebrew). using whisperx with the 'large' model.
no matter how i change the setting in the Generate Configuration window i always get this error.

i'm pasting here the rest of the error massage:

[Training] [2024-01-20T13:15:38.035813] Traceback (most recent call last): File "C:\עידן\ai-voice-cloning\ai-voice-cloning\runtime\lib\site-packages\gradio\routes.py", line 394, in run_predict output = await app.get_blocks().process_api( File "C:\עידן\ai-voice-cloning\ai-voice-cloning\runtime\lib\site-packages\gradio\blocks.py", line 1075, in process_api result = await self.call_function( File "C:\עידן\ai-voice-cloning\ai-voice-cloning\runtime\lib\site-packages\gradio\blocks.py", line 898, in call_function prediction = await anyio.to_thread.run_sync( File "C:\עידן\ai-voice-cloning\ai-voice-cloning\runtime\lib\site-packages\anyio\to_thread.py", line 33, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "C:\עידן\ai-voice-cloning\ai-voice-cloning\runtime\lib\site-packages\anyio\_backends\_asyncio.py", line 877, in run_sync_in_worker_thread return await future File "C:\עידן\ai-voice-cloning\ai-voice-cloning\runtime\lib\site-packages\anyio\_backends\_asyncio.py", line 807, in run result = context.run(func, *args) File "C:\עידן\ai-voice-cloning\ai-voice-cloning\runtime\lib\site-packages\gradio\utils.py", line 549, in async_iteration return next(iterator) File "C:\עידן\ai-voice-cloning\ai-voice-cloning\src\utils.py", line 2021, in run_training for line in iter(training_state.process.stdout.readline, ""): File "codecs.py", line 322, in decode UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 3: invalid start byte

Q: Generating autoregressive samples - how long it should take

Hey JJ,

I got 4090 like you and I was wonder how long it suppose to take to Generating autoregressive samples during last stage of voice cloning? In my case it takes 30 minutes for those two small clips and I have felling its not as it suppose go (training itself took 5 minutes in comparison).

Thanks for answer

On Linux close , but not quite

I know this is not meant for Linux,but wanted to throw this out

Debian 12 / Bookworm

EDIT: Installed the 'required version" of rust v-1.65.0 via rustup and still get the 'does not require mutable bit to be set' error on three different places when trying to build all the tokenizers . Not smart enough to correct what the actual coding problem is

Can come very close to getting this build on Linux / Debian 12,,,but with the stock version of rust,this version is slightly too 1.63 ,,, requires 1.65 , old and the wheels build fails. If i update rust with rustup,,it gets very close to compiling all the weeels but fails with almost completing,,and cannot really determine why it is failing now. I beleive the updated version of rust that gets pulled down V 1.75 is too new and has some slightly different coding default than the required V 1.65
Spent quite a bit of time trying to get it to work,

Where is rvc_pipe?

I tried very hard to look for it, but I couldn't find it. I don't know why this result occurred. help

Error

Please help me
Traceback (most recent call last):
File "D:\Work\Voice_Changer_OpenSorce\ai-voice-cloning\src\main.py", line 20, in
from utils import *
File "D:\Work\Voice_Changer_OpenSorce\ai-voice-cloning\src\utils.py", line 45, in
from rvc_pipe.rvc_infer import rvc_convert
File "D:\Work\Voice_Changer_OpenSorce\ai-voice-cloning\venv\lib\site-packages\rvc_pipe\rvc_infer.py", line 10, in
from rvc.infer.modules.vc.modules import VC
File "D:\Work\Voice_Changer_OpenSorce\ai-voice-cloning\rvc\infer\modules\vc\modules.py", line 20, in
from rvc.infer.modules.vc.utils import *
File "D:\Work\Voice_Changer_OpenSorce\ai-voice-cloning\rvc\infer\modules\vc\utils.py", line 3, in
from fairseq import checkpoint_utils
ModuleNotFoundError: No module named 'fairseq'

NVIDIA GeForce RTX 3060 - UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling

C:\Users\aiwinsor\Documents\dev\ai-voice-cloning>set PYTHONUTF8=1

C:\Users\aiwinsor\Documents\dev\ai-voice-cloning>runtime\python.exe .\src\main.py
C:\Users\aiwinsor\AppData\Roaming\Python\Python39\site-packages\torch\autocast_mode.py:162: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
warnings.warn('User provided device_type of 'cuda', but CUDA is not available. Disabling')
Traceback (most recent call last):
File "C:\Users\aiwinsor\Documents\dev\ai-voice-cloning\src\main.py", line 18, in
from utils import *
File "C:\Users\aiwinsor\Documents\dev\ai-voice-cloning\src\utils.py", line 41, in
from tortoise.api_fast import TextToSpeech as Toroise_TTS_Hifi
File "C:\Users\aiwinsor\Documents\dev\ai-voice-cloning\src\tortoise\api_fast.py", line 114, in
def format_conditioning(clip, cond_length=132300, device="cuda" if not torch.backends.mps.is_available() else 'mps'):
AttributeError: module 'torch.backends' has no attribute 'mps'

C:\Users\aiwinsor\Documents\dev\ai-voice-cloning>pause
Press any key to continue . . .

Installed tts 2.0 but got this error

Windows - NVIDIA 3050

"Possible latent mismatch: click the "(Re)Compute Voice Latents" button and then try again. Error: Workspace can't be allocated, no enough memory."

Here's the whole thing.

C:\ai-voice-cloning-v2_0\ai-voice-cloning>set PYTHONUTF8=1

C:\ai-voice-cloning-v2_0\ai-voice-cloning>runtime\python.exe .\src\main.py
2024-01-24 12:40:24 | INFO | rvc.configs.config | Found GPU NVIDIA GeForce RTX 3050 Ti Laptop GPU
Whisper detected
Traceback (most recent call last):
File "C:\ai-voice-cloning-v2_0\ai-voice-cloning\src\utils.py", line 98, in
from vall_e.emb.qnt import encode as valle_quantize
ModuleNotFoundError: No module named 'vall_e'

Traceback (most recent call last):
File "C:\ai-voice-cloning-v2_0\ai-voice-cloning\src\utils.py", line 118, in
import bark
ModuleNotFoundError: No module named 'bark'

[textbox, textbox, radio, textbox, dropdown, audio, number, slider, number, slider, slider, slider, radio, slider, slider, slider, slider, slider, slider, slider, checkboxgroup, checkbox, checkbox]
[dropdown, slider, dropdown, slider, slider, slider, slider, slider]
Running on local URL: http://127.0.0.1:7861

To create a public link, set share=True in launch().
Loading TorToiSe... (AR: ./training/CoolGuy/finetune/models/26_gpt.pth, diffusion: ./models/tortoise/diffusion_decoder.pth, vocoder: bigvgan_24khz_100band)
Hardware acceleration found: cuda
use_deepspeed api_debug True
C:\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\torch\nn\utils\weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
Loading tokenizer JSON: ./modules/tortoise-tts/tortoise/data/tokenizer.json
Loaded tokenizer
Loading autoregressive model: ./training/CoolGuy/finetune/models/26_gpt.pth
[2024-01-24 12:50:02,448] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.8.3+6eca037c, git-hash=6eca037c, git-branch=HEAD
[2024-01-24 12:50:02,448] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
[2024-01-24 12:50:02,448] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
WARNING! Setting BLOOMLayerPolicy._orig_layer_class to None due to Exception: module 'transformers.models' has no attribute 'bloom'
[2024-01-24 12:50:02,476] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed-Inference config: {'layer_id': 0, 'hidden_size': 1024, 'intermediate_size': 4096, 'heads': 16, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 1, 'q_int8': False, 'scale_attention': True, 'triangular_masking': True, 'local_attention': False, 'window_size': 1, 'rotary_dim': -1, 'rotate_half': False, 'rotate_every_two': True, 'return_tuple': True, 'mlp_after_attn': True, 'mlp_act_func_type': <ActivationFuncType.GELU: 1>, 'specialized_mode': False, 'training_mp_size': 1, 'bigscience_bloom': False, 'max_out_tokens': 1024, 'scale_attn_by_inverse_layer_idx': False, 'enable_qkv_quantization': False, 'use_mup': False, 'return_single_tuple': False}
Loaded autoregressive model
Loaded diffusion model
Loading vocoder model: bigvgan_24khz_100band
Loading vocoder model: bigvgan_24khz_100band.pth
Removing weight norm...
Loaded vocoder model
Loaded TTS, ready for generation.
2024-01-24 13:40:01 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7861/api/predict "HTTP/1.1 200 OK"
2024-01-24 13:40:01 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7861/api/predict "HTTP/1.1 200 OK"
2024-01-24 13:40:01 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7861/reset "HTTP/1.1 200 OK"
2024-01-24 13:40:01 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7861/reset "HTTP/1.1 200 OK"
2024-01-24 13:40:23 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7861/api/predict "HTTP/1.1 200 OK"
2024-01-24 13:40:23 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7861/reset "HTTP/1.1 200 OK"
2024-01-24 13:40:51 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7861/api/predict "HTTP/1.1 200 OK"
[1/1] Generating line: This is a test to see if my voice is back.
Loading voice: CoolGuy with model c3c14d84
Loading voice: CoolGuy
2024-01-24 13:40:51 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7861/reset "HTTP/1.1 200 OK"
Reading from latent: ./voices/CoolGuy//cond_latents_c3c14d84.pth
{'temperature': 0.2, 'top_p': 0.8, 'diffusion_temperature': 1.0, 'length_penalty': 1.0, 'repetition_penalty': 2.0, 'cond_free_k': 2.0, 'num_autoregressive_samples': 2, 'sample_batch_size': 1, 'diffusion_iterations': 30, 'voice_samples': None, 'conditioning_latents': (tensor([[-1.7025, 0.4967, 0.6810, ..., -3.8491, -1.0170, 0.4766]]), tensor([[-0.9459, -1.1040, -0.7465, ..., -0.0278, -0.0548, 0.2184]])), 'use_deterministic_seed': None, 'return_deterministic_state': True, 'k': 1, 'diffusion_sampler': 'DDIM', 'breathing_room': 8, 'half_p': False, 'cond_free': True, 'cvvp_amount': 0, 'autoregressive_model': './training/CoolGuy/finetune/models/26_gpt.pth', 'diffusion_model': './models/tortoise/diffusion_decoder.pth', 'tokenizer_json': './modules/tortoise-tts/tortoise/data/tokenizer.json'}
Requested: 104890368
Free: 0
Total: 4294443008
Traceback (most recent call last):
File "C:\ai-voice-cloning-v2_0\ai-voice-cloning\src\utils.py", line 1235, in generate_tortoise
gen, additionals = tts.tts(cut_text, **settings )
File "C:\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\ai-voice-cloning-v2_0\ai-voice-cloning\src\tortoise\api.py", line 746, in tts
codes = self.autoregressive.inference_speech(auto_conditioning, text_tokens,
File "C:\ai-voice-cloning-v2_0\ai-voice-cloning\src\tortoise\models\autoregressive.py", line 560, in inference_speech
gen = self.inference_model.generate(inputs, bos_token_id=self.start_mel_token, pad_token_id=self.stop_mel_token, eos_token_id=self.stop_mel_token,
File "C:\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\transformers\generation_utils.py", line 1310, in generate
return self.sample(
File "C:\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\transformers\generation_utils.py", line 1926, in sample
outputs = self(
File "C:\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\ai-voice-cloning-v2_0\ai-voice-cloning\src\tortoise\models\autoregressive.py", line 150, in forward
transformer_outputs = self.transformer(
File "C:\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\transformers\models\gpt2\modeling_gpt2.py", line 889, in forward
outputs = block(
File "C:\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\deepspeed\model_implementations\transformers\ds_transformer.py", line 114, in forward
self.allocate_workspace(self.config.hidden_size, self.config.heads,
RuntimeError: Workspace can't be allocated, no enough memory.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\gradio\routes.py", line 394, in run_predict
output = await app.get_blocks().process_api(
File "C:\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\gradio\blocks.py", line 1075, in process_api
result = await self.call_function(
File "C:\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\gradio\blocks.py", line 884, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\anyio_backends_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "C:\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\anyio_backends_asyncio.py", line 807, in run
result = context.run(func, *args)
File "C:\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\gradio\helpers.py", line 587, in tracked_fn
response = fn(*args)
File "C:\ai-voice-cloning-v2_0\ai-voice-cloning\src\webui.py", line 129, in generate_proxy
raise e
File "C:\ai-voice-cloning-v2_0\ai-voice-cloning\src\webui.py", line 123, in generate_proxy
sample, outputs, stats = generate(**kwargs)
File "C:\ai-voice-cloning-v2_0\ai-voice-cloning\src\utils.py", line 364, in generate
return generate_tortoise(**kwargs)
File "C:\ai-voice-cloning-v2_0\ai-voice-cloning\src\utils.py", line 1238, in generate_tortoise
raise RuntimeError(f'Possible latent mismatch: click the "(Re)Compute Voice Latents" button and then try again. Error: {e}')
RuntimeError: Possible latent mismatch: click the "(Re)Compute Voice Latents" button and then try again. Error: Workspace can't be allocated, no enough memory.
2024-01-24 13:40:51 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7861/api/predict "HTTP/1.1 500 Internal Server Error"
2024-01-24 13:40:51 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7861/reset "HTTP/1.1 200 OK"

Start Errors and Issue

Hi, it's my third time downloading the file from scratch, and each time it turns back some errors. HELLPPPP!
I have i5 12th gen and Nvidia 3050 ti with 16gb Ram and 1TB Nvme

D:\PROGRAMS\Ai voice cloning\ai-voice-cloning-v2_0\ai-voice-cloning>runtime\python.exe .\src\main.py
2024-01-25 02:28:39 | INFO | rvc.configs.config | No supported Nvidia GPU found
2024-01-25 02:28:39 | INFO | rvc.configs.config | Use cpu instead
Whisper detected
Traceback (most recent call last):
File "D:\PROGRAMS\Ai voice cloning\ai-voice-cloning-v2_0\ai-voice-cloning\src\utils.py", line 98, in
from vall_e.emb.qnt import encode as valle_quantize
ModuleNotFoundError: No module named 'vall_e'

Traceback (most recent call last):
File "D:\PROGRAMS\Ai voice cloning\ai-voice-cloning-v2_0\ai-voice-cloning\src\utils.py", line 118, in
import bark
ModuleNotFoundError: No module named 'bark'

!WARNING! Automatically deduced sample batch size returned 1.
!WARNING! Automatically deduced sample batch size returned 1.
[textbox, textbox, radio, textbox, dropdown, audio, number, slider, number, slider, slider, slider, radio, slider, slider, slider, slider, slider, slider, slider, checkboxgroup, checkbox, checkbox]
[dropdown, slider, dropdown, slider, slider, slider, slider, slider]
Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch().
!!!! WARNING !!!! No GPU available in PyTorch. You may need to reinstall PyTorch.
Loading TorToiSe... (AR: D:\PROGRAMS\Ai voice cloning\ai-voice-cloning-v2_0\ai-voice-cloning\models\tortoise\autoregressive.pth, diffusion: ./models/tortoise/diffusion_decoder.pth, vocoder: bigvgan_24khz_100band)
No hardware acceleration is available, falling back to CPU...
use_deepspeed api_debug False
Traceback (most recent call last):
File "D:\PROGRAMS\Ai voice cloning\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\urllib3\connectionpool.py", line 700, in urlopen
self._prepare_proxy(conn)
File "D:\PROGRAMS\Ai voice cloning\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\urllib3\connectionpool.py", line 996, in _prepare_proxy
conn.connect()
File "D:\PROGRAMS\Ai voice cloning\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\urllib3\connection.py", line 364, in connect
self.sock = conn = self._connect_tls_proxy(hostname, conn)
File "D:\PROGRAMS\Ai voice cloning\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\urllib3\connection.py", line 499, in connect_tls_proxy
socket = ssl_wrap_socket(
File "D:\PROGRAMS\Ai voice cloning\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\urllib3\util\ssl.py", line 453, in ssl_wrap_socket
ssl_sock = ssl_wrap_socket_impl(sock, context, tls_in_tls)
File "D:\PROGRAMS\Ai voice cloning\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\urllib3\util\ssl.py", line 495, in _ssl_wrap_socket_impl
return ssl_context.wrap_socket(sock)
File "ssl.py", line 500, in wrap_socket
File "ssl.py", line 1040, in _create
File "ssl.py", line 1309, in do_handshake
ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:1129)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:\PROGRAMS\Ai voice cloning\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\requests\adapters.py", line 489, in send
resp = conn.urlopen(
File "D:\PROGRAMS\Ai voice cloning\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\urllib3\connectionpool.py", line 787, in urlopen
retries = retries.increment(
File "D:\PROGRAMS\Ai voice cloning\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\urllib3\util\retry.py", line 592, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /jbetker/wav2vec2-large-robust-ft-libritts-voxpopuli/resolve/main/config.json (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1129)')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:\PROGRAMS\Ai voice cloning\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\transformers\configuration_utils.py", line 601, in _get_config_dict
resolved_config_file = cached_path(
File "D:\PROGRAMS\Ai voice cloning\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\transformers\utils\hub.py", line 282, in cached_path
output_path = get_from_cache(
File "D:\PROGRAMS\Ai voice cloning\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\transformers\utils\hub.py", line 485, in get_from_cache
r = requests.head(url, headers=headers, allow_redirects=False, proxies=proxies, timeout=etag_timeout)
File "D:\PROGRAMS\Ai voice cloning\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\requests\api.py", line 100, in head
return request("head", url, **kwargs)
File "D:\PROGRAMS\Ai voice cloning\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\requests\api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
File "D:\PROGRAMS\Ai voice cloning\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\requests\sessions.py", line 587, in request
resp = self.send(prep, **send_kwargs)
File "D:\PROGRAMS\Ai voice cloning\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\requests\sessions.py", line 701, in send
r = adapter.send(request, **kwargs)
File "D:\PROGRAMS\Ai voice cloning\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\requests\adapters.py", line 563, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /jbetker/wav2vec2-large-robust-ft-libritts-voxpopuli/resolve/main/config.json (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1129)')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:\PROGRAMS\Ai voice cloning\ai-voice-cloning-v2_0\ai-voice-cloning\src\main.py", line 36, in
tts = load_tts()
File "D:\PROGRAMS\Ai voice cloning\ai-voice-cloning-v2_0\ai-voice-cloning\src\utils.py", line 3739, in load_tts
tts = TorToise_TTS(minor_optimizations=not args.low_vram,
File "D:\PROGRAMS\Ai voice cloning\ai-voice-cloning-v2_0\ai-voice-cloning\src\tortoise\api.py", line 298, in init
self.aligner = Wav2VecAlignment(device='cpu' if get_device_name() == "dml" else self.device)
File "D:\PROGRAMS\Ai voice cloning\ai-voice-cloning-v2_0\ai-voice-cloning\src\tortoise\utils\wav2vec_alignment.py", line 58, in init
self.model = Wav2Vec2ForCTC.from_pretrained("jbetker/wav2vec2-large-robust-ft-libritts-voxpopuli").cpu()
File "D:\PROGRAMS\Ai voice cloning\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\transformers\modeling_utils.py", line 1764, in from_pretrained
config, model_kwargs = cls.config_class.from_pretrained(
File "D:\PROGRAMS\Ai voice cloning\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\transformers\configuration_utils.py", line 526, in from_pretrained
config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "D:\PROGRAMS\Ai voice cloning\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\transformers\configuration_utils.py", line 553, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
File "D:\PROGRAMS\Ai voice cloning\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\transformers\configuration_utils.py", line 641, in _get_config_dict
raise EnvironmentError(
OSError: Can't load config for 'jbetker/wav2vec2-large-robust-ft-libritts-voxpopuli'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'jbetker/wav2vec2-large-robust-ft-libritts-voxpopuli' is the correct path to a directory containing a config.json file

D:\PROGRAMS\Ai voice cloning\ai-voice-cloning-v2_0\ai-voice-cloning>pause

[Errno 2] No such file or directory: after updating to windows 11

Hi, i hope you can help me correct this issue. i am getting an errno 2 after i updated to windows 11. all was working prior to update. here is a video link to my error.
https://www.youtube.com/watch?v=xGAEB60rrf4
thanks

ModuleNotFoundError: No module named 'rvc_pipe'

I downloaded the v2 zip version, created venv and did pip install -r requirements.txt.

but when I try to run the program I get:
Traceback (most recent call last): File "/home/anzestrela/tts/ai-voice-cloning/./src/main.py", line 20, in <module> from utils import * File "/home/anzestrela/tts/ai-voice-cloning/src/utils.py", line 45, in <module> from rvc_pipe.rvc_infer import rvc_convert ModuleNotFoundError: No module named 'rvc_pipe'

OS: Ubuntu server
BTW: it is a PVE VM with gpu passtrough but llama.cpp is working so I think this is not the problem.

unable to train tts models

I'm trying to train a voice but it's not doing it. I only have 6 gigs of video memory so have set batch size and gradient accumulation as low as it'll let me but it's still not going.

Voice cloning paused due to a requested directory that doesn't exist.

Hello

I'm not really good at code. But i've been spending hours trying to understand why i can't voice clone when using RVC.
I made sure that the compute and Cuda were compatible.

I believe it's to do with a directory it's requesting.

2024-02-12 03:40:24 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-02-12T03:40:37.154493] 24-02-12 03:40:37.154 - INFO: Loading model for [./models/tortoise/autoregressive.pth]
2024-02-12 03:40:37 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[Training] [2024-02-12T03:40:38.940063] 24-02-12 03:40:38.930 - INFO: Start training from epoch: 0, iter: 0
[Training] [2024-02-12T03:40:41.625444] [2024-02-12 03:40:41,625] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs.
[Training] [2024-02-12T03:40:44.176698] [2024-02-12 03:40:44,176] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs.
[Training] [2024-02-12T03:40:46.535774] D:\AI_2023\ai-voice-cloning\runtime\lib\site-packages\torch\optim\lr_scheduler.py:136: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
[Training] [2024-02-12T03:40:46.535774] warnings.warn("Detected call of lr_scheduler.step() before optimizer.step(). "
[Training] [2024-02-12T03:40:47.640751] D:\AI_2023\ai-voice-cloning\runtime\lib\site-packages\torch\utils\checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
[Training] [2024-02-12T03:40:47.640751] warnings.warn(

-----------------------------This part here-----------------------------------------------------------------------------------

[Training] [2024-02-12T03:40:54.738021] Error no kernel image is available for execution on the device at line 167 in file D:\ai\tool\bitsandbytes\csrc\ops.cu

I'm not sure i understand what to do here.

The result leads to the cloning process being paused.

[Training] [2024-02-12T03:40:55.127367]
[Training] [2024-02-12T03:40:55.127367] D:\AI_2023\ai-voice-cloning>pause

Thanks for great work.

Bug: The following directories listed in your path were found to be non-existent: {WindowsPath('/usr/local/cuda/lib64')}

Hi,
sadly I found myself in situation when I can't train anymore. Yersteday things work like in tutorial, when today I get following errors (I redownloaded thing to check if that fix the issue). Nothing changed since yesterday in terms of installing things - only git pull of alltalk ooba extension but without messing up with requirements.txt

[Training] [2023-12-27T01:42:30.715504] 23-12-27 01:42:30.272 - INFO: Random seed: 1357
[Training] [2023-12-27T01:42:31.223594] 23-12-27 01:42:31.223 - INFO: Number of training data elements: 35, iters: 1
[Training] [2023-12-27T01:42:31.227594] 23-12-27 01:42:31.223 - INFO: Total epochs needed: 200 for iters 200
[Training] [2023-12-27T01:42:33.702266] E:\Magazyn\Grafika\AI\Voice2Voice\ai-voice-cloning\runtime\lib\site-packages\transformers\configuration_utils.py:363: UserWarning: Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 Transformers. Using `model.gradient_checkpointing_enable()` instead, or if you are using the `Trainer` API, pass `gradient_checkpointing=True` in your `TrainingArguments`.
[Training] [2023-12-27T01:42:33.706267]   warnings.warn(
[Training] [2023-12-27T01:42:40.969203] 23-12-27 01:42:40.969 - INFO: Loading model for [./models/tortoise/autoregressive.pth]
[Training] [2023-12-27T01:42:41.496972] 23-12-27 01:42:41.495 - INFO: Start training from epoch: 0, iter: 0
[Training] [2023-12-27T01:42:42.987090] [2023-12-27 01:42:42,987] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs.
[Training] [2023-12-27T01:42:43.018090] [2023-12-27 01:42:43,018] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs.
[Training] [2023-12-27T01:42:45.144538] E:\Magazyn\Grafika\AI\Voice2Voice\ai-voice-cloning\runtime\lib\site-packages\bitsandbytes\cuda_setup\paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {WindowsPath('/usr/local/cuda/lib64')}
[Training] [2023-12-27T01:42:45.144538]   warn(
[Training] [2023-12-27T01:42:45.145538] E:\Magazyn\Grafika\AI\Voice2Voice\ai-voice-cloning\runtime\lib\site-packages\bitsandbytes\cuda_setup\paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {WindowsPath('/usr/local/cuda/lib64')}
[Training] [2023-12-27T01:42:45.145538]   warn(
[Training] [2023-12-27T01:42:47.291812] 23-12-27 01:42:47.290 - INFO: Saving models and training states.
[Training] [2023-12-27T01:42:47.291812] 23-12-27 01:42:47.290 - INFO: Finished training!

Error when generating configuration after preparing dataset

I'm encountering an "Empty dataset" exception in the voice cloning application after following the steps outlined in the installation video. This problem comes up when I switch to the "Generate Configuration" section and Validate Training Configuration.

Traceback (most recent call last):
File "E:\AI_Apps\ai-voice-cloning\runtime\lib\site-packages\gradio\routes.py", line 394, in run_predict
output = await app.get_blocks().process_api(
File "E:\AI_Apps\ai-voice-cloning\runtime\lib\site-packages\gradio\blocks.py", line 1075, in process_api
result = await self.call_function(
File "E:\AI_Apps\ai-voice-cloning\runtime\lib\site-packages\gradio\blocks.py", line 884, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\Users\Ky\AppData\Roaming\Python\Python39\site-packages\anyio\to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\Users\Ky\AppData\Roaming\Python\Python39\site-packages\anyio_backends_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "C:\Users\Ky\AppData\Roaming\Python\Python39\site-packages\anyio_backends_asyncio.py", line 807, in run
result = context.run(func, *args)
File "E:\AI_Apps\ai-voice-cloning\src\webui.py", line 272, in optimize_training_settings_proxy
settings, messages = optimize_training_settings(**kwargs)
File "E:\AI_Apps\ai-voice-cloning\src\utils.py", line 2809, in optimize_training_settings
raise Exception("Empty dataset.")
Exception: Empty dataset.

Environment:
Operating System: Windows
Hardware: NVIDIA GeForce RTX 4090, Intel Core i7-12700KF, 64 GB DDR4 RAM

Let me know if any other details on this bug would be helpful! Thanks for looking into resolving this.

Suddenly I can only create 'Ultra Fast' level output. Even set to 'High Quality'

I downloaded this yesterday. I followed the video tutorial. I trained a model on janky audio I clipped quickly just to practice. On the high quality setting I was very impressed. There were artifacts but it sounded very accurate, although it took an hour to generate one sentence. So I figured I'd train a new model with better audio to try for perfect results.

After doing so, tts generation on any model / voice or quality preset only takes a few seconds and sounds awful. I noticed it's taking much less vram as well. I made sure to restart tts with the correct model selected in the settings. What could be the issue here?

Windows
RTX 4080

API integration

I'd love to be able to link the new features to other apps with a simple API, can imagine that's the case for others too. I want to help out writing it if you need an extra pair of hands. Super work on the project!

Questions...

Dear JM,

I have a several questions about the files created during "transcribe and process". Is this forum the best place to post these questions, or should they be posted elsewhere? / Thanks

start.bat jumps to pause and doesn't go any further

Hey man, thank you for putting this all together. I've played around with TortoiseTTS in the past and wanted to get back into it. Found this approach interesting and wanted to give it a try. For some reason when launching the start.bat it kinda loads but jumps to the pause statement after loading the ui link.

I know a bit of python and played around a bit. Eventually found the problem to be during the load_tts function. This took me to the utils class. In that file there a few warning, but I'd guess the problem comes from those weirdly indented lines 146 through 157. Any idea if that's where the problem comes from or if i have something set up wrong?

Also, it's kinda weird that it exits without an error and goes to the pause statement.

jarodmica / ai-voice-cloning Goto Github PK

ai-voice-cloning's Introduction

AI Voice Cloning

Setup

Windows Package (Recommended)

Alternative Manual Installation

Docker for Linux (or WSL2)

Linux Specific Setup

Windows Specific Setup

Building and Running in Docker

Instructions

Updating Your Installation

Windows

Alternative Manual Installation

Linux via Docker

Documentation

Troubleshooting Manual Installation

Bug Reporting

Tips for developers

ai-voice-cloning's People

Contributors

Stargazers

Watchers

Forkers

ai-voice-cloning's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs