GithubHelp home page GithubHelp logo

reviseuc73 / rvc_cli Goto Github PK

View Code? Open in Web Editor NEW

This project forked from blaise-tk/rvc_cli

0.0 0.0 0.0 777 KB

RVC CLI enables seamless interaction with Retrieval-based Voice Conversion through commands or HTTP requests.

License: Other

Shell 0.02% Python 92.21% Batchfile 0.64% Jupyter Notebook 7.12%

rvc_cli's Introduction

RVC_CLI: Retrieval-based Voice Conversion Command Line Interface

Open In Collab

Table of Contents

  1. Installation
  2. Getting Started
  3. API
  4. Credits

Installation

Ensure that you have the necessary Python packages installed by following these steps (Python 3.9 is recommended):

Windows

Execute the install.bat file to activate a Conda environment. Afterward, launch the application using env/python.exe main.py instead of the conventional python main.py command.

Linux

chmod +x install.sh
./install.sh

Getting Started

Download the necessary models and executables by running the following command:

python main.py prerequisites

More information about the prerequisites command here

For detailed information and command-line options, refer to the help command:

python main.py -h

This command provides a clear overview of the available modes and their corresponding parameters, facilitating effective utilization of the RVC CLI.

Inference

Single Inference

python main.py infer --f0up_key "f0up_key" --filter_radius "filter_radius" --index_rate "index_rate" --hop_length "hop_length" --rms_mix_rate "rms_mix_rate" --protect "protect" --f0autotune "f0autotune" --f0method "f0method" --input_path "input_path" --output_path "output_path" --pth_path "pth_path" --index_path "index_path" --split_audio "split_audio" --clean_audio "clean_audio" --clean_strength "clean_strength" --export_format "export_format"
Parameter Name Required Default Valid Options Description
f0up_key No 0 -24 to +24 Set the pitch of the audio, the higher the value, thehigher the pitch.
filter_radius No 3 0 to 10 If the number is greater than or equal to three, employing median filtering on the collected tone results has the potential to decrease respiration.
index_rate No 0.3 0.0 to 1.0 Influence exerted by the index file; a higher value corresponds to greater influence. However, opting for lower values can help mitigate artifacts present in the audio.
hop_length No 128 1 to 512 Denotes the duration it takes for the system to transition to a significant pitch change. Smaller hop lengths require more time for inference but tend to yield higher pitch accuracy.
rms_mix_rate No 1 0 to 1 Substitute or blend with the volume envelope of the output. The closer the ratio is to 1, the more the output envelope is employed.
protect No 0.33 0 to 0.5 Safeguard distinct consonants and breathing sounds to prevent electro-acoustic tearing and other artifacts. Pulling the parameter to its maximum value of 0.5 offers comprehensive protection. However, reducing this value might decrease the extent of protection while potentially mitigating the indexing effect.
f0autotune No False True or False Apply a soft autotune to your inferences, recommended for singing conversions.
f0method No rmvpe pm, harvest, dio, crepe, crepe-tiny, rmvpe, fcpe, hybrid[crepe+rmvpe], hybrid[crepe+fcpe], hybrid[rmvpe+fcpe], hybrid[crepe+rmvpe+fcpe] Pitch extraction algorithm to use for the audio conversion. The default algorithm is rmvpe, which is recommended for most cases.
input_path Yes None Full path to the input audio file Full path to the input audio file
output_path Yes None Full path to the output audio file Full path to the output audio file
pth_path Yes None Full path to the pth file Full path to the pth file
index_path Yes None Full index file path Full index file path
split_audio No False True or False Split the audio into chunks for inference to obtain better results in some cases.
clean_audio No False True or False Clean your audio output using noise detection algorithms, recommended for speaking audios.
clean_strength No 0.7 0.0 to 1.0 Set the clean-up level to the audio you want, the more you increase it the more it will clean up, but it is possible that the audio will be more compressed.
export_format No WAV WAV, MP3, FLAC, OGG, M4A File audio format

Refer to python main.py infer -h for additional help.

Batch Inference

python main.py batch_infer --f0up_key "f0up_key" --filter_radius "filter_radius" --index_rate "index_rate" --hop_length "hop_length" --rms_mix_rate "rms_mix_rate" --protect "protect" --f0autotune "f0autotune" --f0method "f0method" --input_folder_path "input_folder_path" --output_folder_path "output_folder_path" --pth_path "pth_path" --index_path "index_path" --split_audio "split_audio" --clean_audio "clean_audio" --clean_strength "clean_strength" --export_format "export_format"
Parameter Name Required Default Valid Options Description
f0up_key No 0 -24 to +24 Set the pitch of the audio, the higher the value, thehigher the pitch.
filter_radius No 3 0 to 10 If the number is greater than or equal to three, employing median filtering on the collected tone results has the potential to decrease respiration.
index_rate No 0.3 0.0 to 1.0 Influence exerted by the index file; a higher value corresponds to greater influence. However, opting for lower values can help mitigate artifacts present in the audio.
hop_length No 128 1 to 512 Denotes the duration it takes for the system to transition to a significant pitch change. Smaller hop lengths require more time for inference but tend to yield higher pitch accuracy.
rms_mix_rate No 1 0 to 1 Substitute or blend with the volume envelope of the output. The closer the ratio is to 1, the more the output envelope is employed.
protect No 0.33 0 to 0.5 Safeguard distinct consonants and breathing sounds to prevent electro-acoustic tearing and other artifacts. Pulling the parameter to its maximum value of 0.5 offers comprehensive protection. However, reducing this value might decrease the extent of protection while potentially mitigating the indexing effect.
f0autotune No False True or False Apply a soft autotune to your inferences, recommended for singing conversions.
f0method No rmvpe pm, harvest, dio, crepe, crepe-tiny, rmvpe, fcpe, hybrid[crepe+rmvpe], hybrid[crepe+fcpe], hybrid[rmvpe+fcpe], hybrid[crepe+rmvpe+fcpe] Pitch extraction algorithm to use for the audio conversion. The default algorithm is rmvpe, which is recommended for most cases.
input_folder_path Yes None Full path to the input audio folder (The folder may only contain audio files) Full path to the input audio folder
output_folder_path Yes None Full path to the output audio folder Full path to the output audio folder
pth_path Yes None Full path to the pth file Full path to the pth file
index_path Yes None Full path to the index file Full path to the index file
split_audio No False True or False Split the audio into chunks for inference to obtain better results in some cases.
clean_audio No False True or False Clean your audio output using noise detection algorithms, recommended for speaking audios.
clean_strength No 0.7 0.0 to 1.0 Set the clean-up level to the audio you want, the more you increase it the more it will clean up, but it is possible that the audio will be more compressed.
export_format No WAV WAV, MP3, FLAC, OGG, M4A File audio format

Refer to python main.py batch_infer -h for additional help.

TTS Inference

python main.py tts_infer --tts_text "tts_text" --tts_voice "tts_voice" --f0up_key "f0up_key" --filter_radius "filter_radius" --index_rate "index_rate" --hop_length "hop_length" --rms_mix_rate "rms_mix_rate" --protect "protect" --f0autotune "f0autotune" --f0method "f0method" --output_tts_path "output_tts_path" --output_rvc_path "output_rvc_path" --pth_path "pth_path" --index_path "index_path"--split_audio "split_audio" --clean_audio "clean_audio" --clean_strength "clean_strength" --export_format "export_format"
Parameter Name Required Default Valid Options Description
tts_text Yes None Text for TTS synthesis Text for TTS synthesis
tts_voice Yes None Voice for TTS synthesis Voice for TTS synthesis
f0up_key No 0 -24 to +24 Set the pitch of the audio, the higher the value, thehigher the pitch.
filter_radius No 3 0 to 10 If the number is greater than or equal to three, employing median filtering on the collected tone results has the potential to decrease respiration.
index_rate No 0.3 0.0 to 1.0 Influence exerted by the index file; a higher value corresponds to greater influence. However, opting for lower values can help mitigate artifacts present in the audio.
hop_length No 128 1 to 512 Denotes the duration it takes for the system to transition to a significant pitch change. Smaller hop lengths require more time for inference but tend to yield higher pitch accuracy.
rms_mix_rate No 1 0 to 1 Substitute or blend with the volume envelope of the output. The closer the ratio is to 1, the more the output envelope is employed.
protect No 0.33 0 to 0.5 Safeguard distinct consonants and breathing sounds to prevent electro-acoustic tearing and other artifacts. Pulling the parameter to its maximum value of 0.5 offers comprehensive protection. However, reducing this value might decrease the extent of protection while potentially mitigating the indexing effect.
f0autotune No False True or False Apply a soft autotune to your inferences, recommended for singing conversions.
f0method No rmvpe pm, harvest, dio, crepe, crepe-tiny, rmvpe, fcpe, hybrid[crepe+rmvpe], hybrid[crepe+fcpe], hybrid[rmvpe+fcpe], hybrid[crepe+rmvpe+fcpe] Pitch extraction algorithm to use for the audio conversion. The default algorithm is rmvpe, which is recommended for most cases.
output_tts_path Yes None Full path to the output TTS audio file Full path to the output TTS audio file
output_rvc_path Yes None Full path to the input RVC audio file Full path to the input RVC audio file
pth_path Yes None Full path to the pth file Full path to the pth file
index_path Yes None Full path to the index file Full path to the index file
split_audio No False True or False Split the audio into chunks for inference to obtain better results in some cases.
clean_audio No False True or False Clean your audio output using noise detection algorithms, recommended for speaking audios.
clean_strength No 0.7 0.0 to 1.0 Set the clean-up level to the audio you want, the more you increase it the more it will clean up, but it is possible that the audio will be more compressed.
export_format No WAV WAV, MP3, FLAC, OGG, M4A File audio format

Refer to python main.py tts_infer -h for additional help.

Training

Preprocess Dataset

python main.py preprocess --model_name "model_name" --dataset_path "dataset_path" --sampling_rate "sampling_rate"
Parameter Name Required Default Valid Options Description
model_name Yes None Name of the model Name of the model
dataset_path Yes None Full path to the dataset folder (The folder may only contain audio files) Full path to the dataset folder
sampling_rate Yes None 32000, 40000, or 48000 Sampling rate of the audio data

Refer to python main.py preprocess -h for additional help.

Extract Features

python main.py extract --model_name "model_name" --rvc_version "rvc_version" --f0method "f0method" --hop_length "hop_length" --sampling_rate "sampling_rate"
Parameter Name Required Default Valid Options Description
model_name Yes None Name of the model Name of the model
rvc_version No v2 v1 or v2 Version of the model
f0method No rmvpe pm, harvest, dio, crepe, crepe-tiny, rmvpe Pitch extraction algorithm to use for the audio conversion. The default algorithm is rmvpe, which is recommended for most cases.
hop_length No 128 1 to 512 Denotes the duration it takes for the system to transition to a significant pitch change. Smaller hop lengths require more time for inference but tend to yield higher pitch accuracy.
sampling_rate Yes None 32000, 40000, or 48000 Sampling rate of the audio data

Start Training

python main.py train --model_name "model_name" --rvc_version "rvc_version" --save_every_epoch "save_every_epoch" --save_only_latest "save_only_latest" --save_every_weights "save_every_weights" --total_epoch "total_epoch" --sampling_rate "sampling_rate" --batch_size "batch_size" --gpu "gpu" --pitch_guidance "pitch_guidance" --overtraining_detector "overtraining_detector" --overtraining_threshold "overtraining_threshold" --pretrained "pretrained" --custom_pretrained "custom_pretrained" [--g_pretrained "g_pretrained"] [--d_pretrained "d_pretrained"]
Parameter Name Required Default Valid Options Description
model_name Yes None Name of the model Name of the model
rvc_version No v2 v1 or v2 Version of the model
save_every_epoch Yes None 1 to 50 Determine at how many epochs the model will saved at.
save_only_latest No False True or False Enabling this setting will result in the G and D files saving only their most recent versions, effectively conserving storage space.
save_every_weights No True True or False This setting enables you to save the weights of the model at the conclusion of each epoch.
total_epoch No 1000 1 to 10000 Specifies the overall quantity of epochs for the model training process.
sampling_rate Yes None 32000, 40000, or 48000 Sampling rate of the audio data
batch_size No 8 1 to 50 It's advisable to align it with the available VRAM of your GPU. A setting of 4 offers improved accuracy but slower processing, while 8 provides faster and standard results.
gpu No 0 0 to โˆž separated by - Specify the number of GPUs you wish to utilize for training by entering them separated by hyphens (-).
pitch_guidance No True True or False By employing pitch guidance, it becomes feasible to mirror the intonation of the original voice, including its pitch. This feature is particularly valuable for singing and other scenarios where preserving the original melody or pitch pattern is essential.
overtraining_detector No False True or False Utilize the overtraining detector to prevent overfitting. This feature is particularly valuable for scenarios where the model is at risk of overfitting.
overtraining_threshold No 50 1 to 100 Set the threshold for the overtraining detector. The lower the value, the more sensitive the detector will be.
pretrained No True True or False Utilize pretrained models when training your own. This approach reduces training duration and enhances overall quality.
custom_pretrained No False True or False Utilizing custom pretrained models can lead to superior results, as selecting the most suitable pretrained models tailored to the specific use case can significantly enhance performance.
g_pretrained No None Full path to pretrained file G, only if you have used custom_pretrained Full path to pretrained file G
d_pretrained No None Full path to pretrained file D, only if you have used custom_pretrained Full path to pretrained file D

Refer to python main.py train -h for additional help.

Generate Index File

python main.py index --model_name "model_name" --rvc_version "rvc_version"
Parameter Name Required Default Valid Options Description
model_name Yes None Name of the model Name of the model
rvc_version Yes None v1 or v2 Version of the model

Refer to python main.py index -h for additional help.

Additional Features

Model Extract

python main.py model_extract --pth_path "pth_path" --model_name "model_name" --sampling_rate "sampling_rate" --pitch_guidance "pitch_guidance" --rvc_version "rvc_version" --epoch "epoch" --step "step"
Parameter Name Required Default Valid Options Description
pth_path Yes None Path to the pth file Full path to the pth file
model_name Yes None Name of the model Name of the model
sampling_rate Yes None 32000, 40000, or 48000 Sampling rate of the audio data
pitch_guidance Yes None True or False By employing pitch guidance, it becomes feasible to mirror the intonation of the original voice, including its pitch. This feature is particularly valuable for singing and other scenarios where preserving the original melody or pitch pattern is essential.
rvc_version Yes None v1 or v2 Version of the model
epoch Yes None 1 to 10000 Specifies the overall quantity of epochs for the model training process.
step Yes None 1 to โˆž Specifies the overall quantity of steps for the model training process.

Model Information

python main.py model_information --pth_path "pth_path"
Parameter Name Required Default Valid Options Description
pth_path Yes None Path to the pth file Full path to the pth file

Model Blender

python main.py model_blender --model_name "model_name" --pth_path_1 "pth_path_1" --pth_path_2 "pth_path_2" --ratio "ratio"
Parameter Name Required Default Valid Options Description
model_name Yes None Name of the model Name of the model
pth_path_1 Yes None Path to the first pth file Full path to the first pth file
pth_path_2 Yes None Path to the second pth file Full path to the second pth file
ratio No 0.5 0.0 to 1 Value for blender ratio

Launch TensorBoard

python main.py tensorboard

Download Models

Run the download script with the following command:

python main.py download --model_link "model_link"
Parameter Name Required Default Valid Options Description
model_link Yes None Link of the model (enclosed in double quotes; Google Drive or Hugging Face) Link of the model

Refer to python main.py download -h for additional help.

Audio Analyzer

python main.py audio_analyzer --input_path "input_path"
Parameter Name Required Default Valid Options Description
input_path Yes None Full path to the input audio file Full path to the input audio file

Refer to python main.py audio_analyzer -h for additional help.

Prerequisites Download

python main.py prerequisites --pretraineds_v1 "pretraineds_v1" --pretraineds_v2 "--pretraineds_v2" --models "models" --exe "exe"
Parameter Name Required Default Valid Options Description
pretraineds_v1 No True True or False Download pretrained models for v1
pretraineds_v2 No True True or False Download pretrained models for v2
models No True True or False Download models for v1 and v2
exe No True True or False Download the necessary executable files for the CLI to function properly (FFmpeg and FFprobe)

API

python main.py api --host "host" --port "port"
Parameter Name Required Default Valid Options Description
host No 127.0.0.1 Value for host IP Value for host IP
port No 8000 Value for port number Value for port number

To use the RVC CLI via the API, utilize the provided script. Make API requests to the following endpoints:

  • Docs: /docs
  • Ping: /ping
  • Infer: /infer
  • Batch Infer: /batch_infer
  • TTS: /tts
  • Preprocess: /preprocess
  • Extract: /extract
  • Train: /train
  • Index: /index
  • Model Information: /model_information
  • Model Fusion: /model_fusion
  • Download: /download

Make POST requests to these endpoints with the same required parameters as in CLI mode.

Credits

The RVC CLI builds upon the foundations of the following projects:

We acknowledge and appreciate the contributions of the respective authors and communities involved in these projects.

rvc_cli's People

Contributors

blaise-tk avatar aitronssesin avatar vidalnt avatar github-actions[bot] avatar poiqazwsx avatar lukaszliniewicz avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.