GithubHelp home page GithubHelp logo

bigvsan's Introduction

BigVSAN

This repository contains the official PyTorch implementation of "BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network" (arXiv 2309.02836). Please cite [1] in your work when using this code in your experiments.

Installation

This repository builds on the codebase of BigVGAN.

Download the LibriTTS dataset here in advance.

Clone the repository and install dependencies.

# the codebase has been tested on Python 3.8 with PyTorch 1.13.0
git clone https://github.com/sony/bigvsan
pip install -r requirements.txt

Create symbolic link to the root of the dataset. The codebase uses filelist with the relative path from the dataset. Below are the example commands for LibriTTS dataset.

cd LibriTTS && \
ln -s /path/to/your/LibriTTS/train-clean-100 train-clean-100 && \
ln -s /path/to/your/LibriTTS/train-clean-360 train-clean-360 && \
ln -s /path/to/your/LibriTTS/train-other-500 train-other-500 && \
ln -s /path/to/your/LibriTTS/dev-clean dev-clean && \
ln -s /path/to/your/LibriTTS/dev-other dev-other && \
ln -s /path/to/your/LibriTTS/test-clean test-clean && \
ln -s /path/to/your/LibriTTS/test-other test-other && \
cd ..

Training

Train BigVSAN model. Below is an example command for training BigVSAN using LibriTTS dataset at 24kHz with a full 100-band mel spectrogram as input.

python train.py \
--config configs/bigvsan_24khz_100band.json \
--input_wavs_dir LibriTTS \
--input_training_file LibriTTS/train-full.txt \
--input_validation_file LibriTTS/val-full.txt \
--list_input_unseen_wavs_dir LibriTTS LibriTTS \
--list_input_unseen_validation_file LibriTTS/dev-clean.txt LibriTTS/dev-other.txt \
--checkpoint_path exp/bigvsan

Evaluation

We evaluated our BigVSAN model as follows:

Generate and save audio samples after you finish model training. Below is an example command for generating and save audio samples for evaluation.

python train.py \
--config configs/bigvsan_24khz_100band.json \
--input_wavs_dir LibriTTS \
--input_training_file LibriTTS/train-full.txt \
--input_validation_file LibriTTS/val-full.txt \
--list_input_unseen_wavs_dir LibriTTS LibriTTS \
--list_input_unseen_validation_file LibriTTS/dev-clean.txt LibriTTS/dev-other.txt \
--checkpoint_path exp/bigvsan \
--evaluate True \
--eval_subsample 1 \
--skip_seen True \
--save_audio True

Run the evaluation tool provided here. It computes five objective metric scores: M-STFT, PESQ, MCD, Periodicity, and V/UV F1.

python evaluate.py \
../bigvsan/exp/bigvsan/samples/gt_unseen_LibriTTS-dev-clean ../bigvsan/exp/bigvsan/samples/unseen_LibriTTS-dev-clean_01000001 \
../bigvsan/exp/bigvsan/samples/gt_unseen_LibriTTS-dev-other ../bigvsan/exp/bigvsan/samples/unseen_LibriTTS-dev-other_01000001

It will take about an hour to complete an evaluation. Note that, when audio samples are generated and saved with train.py, it also outputs M-STFT and PESQ scores, but their values will be different from the output of evaluate.py. This is due to 16-bit quantization for saving a sample as a wav file.

Synthesis

Synthesize from BigVSAN model. Below is an example command for generating audio from the model. It computes mel spectrograms using wav files from --input_wavs_dir and saves the generated audio to --output_dir.

python inference.py \
--checkpoint_file exp/bigvsan/g_01000000 \
--input_wavs_dir /path/to/your/input_wav \
--output_dir /path/to/your/output_wav

Citation

[1] Shibuya, T., Takida, Y., Mitsufuji, Y., "BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network," Preprint.

@ARTICLE{shibuya2023bigvsan,
    author={Shibuya, Takashi and Takida, Yuhta and Mitsufuji, Yuki},
    title={{BigVSAN}: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network},
    journal={Computing Research Repository},
    volume={arXiv:2309.02836},
    year={2023},
    url={https://arxiv.org/abs/2309.02836},
    }

References

bigvsan's People

Contributors

wpingnet avatar takashishibuyasony avatar l0sg avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.