GithubHelp home page GithubHelp logo

lwz519 / comosvc Goto Github PK

View Code? Open in Web Editor NEW

This project forked from grace9994/comosvc

0.0 0.0 0.0 203 KB

CoMoSVC: One-Step Consistency Model Based Singing Voice Conversion & Singing Voice Clone

License: MIT License

Python 100.00%

comosvc's Introduction

CoMoSVC: Consistency Model Based Singing Voice Conversion

中文文档

A consistency model based Singing Voice Conversion system is composed, which is inspired by CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model.

This is an implemention of the paper CoMoSVC.

Improvements

The subjective evaluations are illustrated through the table below.

Environment

We have tested the code and it runs successfully on Python 3.8, so you can set up your Conda environment using the following command:

conda create -n Your_Conda_Environment_Name python=3.8

Then after activating your conda environment, you can install the required packages under it by:

pip install -r requirements.txt

Download the Checkpoints

1. m4singer_hifigan

You should first download m4singer_hifigan and then unzip the zip file by

unzip m4singer_hifigan.zip

The checkpoints of the vocoder will be in the m4singer_hifigan directory

2. ContentVec

You should download the checkpoint ContentVec and the put it in the Content directory to extract the content feature.

3. m4singer_pe

You should download the pitch_extractor checkpoint of the m4singer_pe and then unzip the zip file by

unzip m4singer_pe.zip

Dataset Preparation

You should first create the folders by

mkdir dataset_raw
mkdir dataset

You can refer to different preparation methods based on your needs.

Preparation With Slicing can help you remove the silent parts and slice the audio for stable training.

0. Preparation With Slicing

Please place your original dataset in the dataset_slice directory.

The original audios can be in any waveformat which should be specified in the command line. You can designate the length of slices you want, the unit of slice_size is milliseconds. The default wavformat and slice_size is mp3 and 10000 respectively.

python preparation_slice.py -w your_wavformat -s slice_size

1. Preparation Without Slicing

You can just place the dataset in the dataset_raw directory with the following file structure:

dataset_raw
├───speaker0
│   ├───xxx1-xxx1.wav
│   ├───...
│   └───Lxx-0xx8.wav
└───speaker1
    ├───xx2-0xxx2.wav
    ├───...
    └───xxx7-xxx007.wav

Preprocessing

1. Resample to 24000Hz and mono

python preprocessing1_resample.py -n num_process

num_process is the number of processes, the default num_process is 5.

2. Split the Training and Validation Datasets, and Generate Configuration Files.

python preprocessing2_flist.py

3. Generate Features

python preprocessing3_feature.py -c your_config_file -n num_processes 

Training

1. Train the Teacher Model

python train.py

The checkpoints will be saved in the logs/teacher directory

2. Train the Consistency Model

If you want to adjust the config file, you can duplicate a new config file and modify some parameters.

python train.py -t -c Your_new_configfile_path -p The_teacher_model_checkpoint_path 

Inference

You should put the audios you want to convert under the raw directory firstly.

Inference by the Teacher Model

python inference_main.py -tm "logs/teacher/model_800000.pt" -tc "logs/teacher/config.yaml" -n "src.wav" -k 0 -s "target_singer"

-tm refers to the teacher_model_path

-tc refers to the teacher_config_path

-n refers to the source audio

-k refers to the pitch shift, it can be positive and negative (semitone) values

-s refers to the target singer

Inference by the Consistency Model

python inference_main.py -cm "logs/como/model_800000.pt" -cc "logs/como/config.yaml" -n "src.wav" -k 0 -s "target_singer" -t

-cm refers to the como_model_path

-cc refers to the como_config_path

-t means it is not the teacher model and you don't need to specify anything after it

comosvc's People

Contributors

grace9994 avatar zhenye234 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.