GithubHelp home page GithubHelp logo

shigabeev / q-vits2-voice-cloning Goto Github PK

View Code? Open in Web Editor NEW

This project forked from fenrlr/mb-istft-vits2

4.0 2.0 4.0 4.19 MB

WIP: VITS 2 with quantized output of text-encoder and voice cloning

License: MIT License

Python 96.09% Jupyter Notebook 3.53% Cython 0.37%

q-vits2-voice-cloning's Introduction

The ultimate VITS2

Alt text

The idea for this repo is to implement the most comprehensive VITS2 out here.

Changelist

  • Bump Librosa and python version to the highest
  • Implement d-vector instead of speaker id for external speaker encoder as in YourTTS.
  • Implement YourTTS styled d-vector-free text encoder and d-vector as an input to vocoder (currenlty only HiFiGAN does that)
  • implement dataloader that would load d-vectors
  • Add quantized Text Encoder. BERT -> bottleneck -> text features.
  • VCTK audio loader
  • Implement a better vocoder with support for d-vector
  • Remove boilerplate code in attentions.py and replace it with native torch.nn.Encoder
  • Adan optimizer
  • PyTorch Lightning support
  • Add Bidirectional Flow Loss

pre-requisites

  1. Python >= 3.8

  2. CUDA

  3. Pytorch version 1.13.1 (+cu117)

  4. Clone this repository

  5. Install python requirements.

    pip install -r requirements.txt
    

    If you want to proceed with those cleaned texts in filelists, you need to install espeak.

    apt-get install espeak
    
  6. Prepare datasets & configuration

    1. wav files (22050Hz Mono, PCM-16)

    2. Prepare text files. One for training(ex) and one for validation(ex). Split your dataset to each files. As shown in these examples, the datasets in validation file should be fewer than the training one, while being unique from those of training text.

      • Single speaker(ex)
      wavfile_path|transcript
      
      wavfile_path|speaker_id|transcript
      
    3. Run preprocessing with a cleaner of your interest. You may change the symbols as well.

      • Single speaker
      python preprocess.py --text_index 1 --filelists PATH_TO_train.txt --text_cleaners CLEANER_NAME
      python preprocess.py --text_index 1 --filelists PATH_TO_val.txt --text_cleaners CLEANER_NAME
      
      • Multi speaker
      python preprocess.py --text_index 2 --filelists PATH_TO_train.txt --text_cleaners CLEANER_NAME
      python preprocess.py --text_index 2 --filelists PATH_TO_val.txt --text_cleaners CLEANER_NAME
      

      The resulting cleaned text would be like this(single). ex - multi

  7. Build Monotonic Alignment Search.

# Cython-version Monotonoic Alignment Search
cd monotonic_align
mkdir monotonic_align
python setup.py build_ext --inplace
  1. Edit configurations based on files and cleaners you used.

Setting json file in configs

Model How to set up json file in configs Sample of json file configuration
iSTFT-VITS2 "istft_vits": true,
"upsample_rates": [8,8],
istft_vits2_base.json
MB-iSTFT-VITS2 "subbands": 4,
"mb_istft_vits": true,
"upsample_rates": [4,4],
mb_istft_vits2_base.json
MS-iSTFT-VITS2 "subbands": 4,
"ms_istft_vits": true,
"upsample_rates": [4,4],
ms_istft_vits2_base.json
Mini-iSTFT-VITS2 "istft_vits": true,
"upsample_rates": [8,8],
"hidden_channels": 96,
"n_layers": 3,
mini_istft_vits2_base.json
Mini-MB-iSTFT-VITS2 "subbands": 4,
"mb_istft_vits": true,
"upsample_rates": [4,4],
"hidden_channels": 96,
"n_layers": 3,
"upsample_initial_channel": 256,
mini_mb_istft_vits2_base.json

Training Example

# train_ms.py for multi speaker
# train_l.py to use Lightning
python train_ms.py -c configs/shergin_d_vector_hfg.json -m models/test

Contact

If you have any questions regarding how to run it, contact us in Telegram

https://t.me/voice_stuff_chat

Credits

q-vits2-voice-cloning's People

Contributors

fenrlr avatar juliakorovsky avatar nshmyrev avatar shigabeev avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.