GithubHelp home page GithubHelp logo

vits_korean_multispeaker's Introduction

Korean multi-speaker VITS

This project was implemented using the official PyTorch implementation by "jaywalnut310". After training for 10 epochs (32 batch size, 460k steps), the inference results for two random male and female speakers are available in the inference_samples.

Getting Started

Setting up the development environment for this project can be challenging due to version conflicts with various libraries.

Therefore, we managed the development environment of this project using a Docker container.

The Docker image, which can be used to create the Docker container, can be downloaded from Docker Hub.

Dataset

The data used for model training can be downloaded from the following link.

Data Preprocessing

데이터 세트를 다운로드 한 뒤 학습할 수 있게 전처리를 해야 합니다.

  1. If file paths contain Korean or special characters, change them to English or numbers.
  2. Convert .wav files for training to a 22kHz sampling rate.
  3. If there are stereo files, convert them to mono.
  4. The filelists downloaded from Google Drive link the labels and wav files. (.cleaned files are the result of converting sentences using g2pk.) 4-1. If using a different phoneme conversion module, convert a few English words to Korean pronunciation and remove the '\xa0' special character.
  5. Place the make_mels.py file at the top path of the dataset and generate melspectrograms. (Pre-create files required for training.)

Installing

You can clone this GitHub repository and use it.

git clone https://github.com/0913ktg/vits_korean_multispeaker

You can download the model checkpoints and filelists from the Google Drive link.

Train

Once you have 22kHz audio files, train, and validation filelists, and have completed data preprocessing, you can start training by running train_ms.py. Multi-GPU usage has been confirmed.

Synthesis

  1. In inference.py, modify the path for the Generator checkpoint accordingly.
  2. Enter the desired Korean sentences in texts. (Separate multiple sentences with a comma.)
  3. Enter the speaker number in sid. (from 0 to 184)
  4. Running inference.py will create a file named test{i}.wav.

vits_korean_multispeaker's People

Contributors

0913ktg avatar

Stargazers

EX3 avatar Mingyu Kim avatar ColinSnow avatar  avatar  avatar liuhuang31 avatar MaxMax avatar  avatar

Watchers

 avatar

Forkers

maxmax2016

vits_korean_multispeaker's Issues

Bert-VITS2 Korean version

Hello DaegyeomKim,

I am also trying to training a Korean version of Bert-VITS2. Have you training the Korean models successfully?

Thank you for sharing your excellent code. Have a great day.

Best regards.

22kHz sampling rate 변환 문의

안녕하십니까, 음성 합성과 관련해 연구하고 있는 김민규입니다.

데이터 전처리 시 .wav 파일들의 samplling rate를 22kHz로 변환하라고 하셨는데
AI Hub의 데이터들의 sampling rate를 어떻게 변환하셨는지 알려주실 수 있으실까요?

감사합니다.

sid 별 화자 정보 질문드립니다.

안녕하세요, 인공지능 연구하고 있는 김민규입니다.

Pre-trained 모델에서 sid가 185개로 설정이 가능한데 학습 데이터를 봤을 때,
구연체, 중계체 등 다양한 대화 스타일 및 기쁨, 슬픔 등 감정들이 들어 가 있는 것 같습니다.

혹시 각 index가 어떤 스타일 또는 감정을 가지고 있는지 구분할 수 있는 정보가 있을까요?
있으시다면 알려주시면 감사하겠습니다.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.