GithubHelp home page GithubHelp logo

rickyl-2000 / alignsts Goto Github PK

View Code? Open in Web Editor NEW
62.0 3.0 6.0 182 KB

Findings of ACL 2023 | AlignSTS: a speech-to-singing (STS) model based on modality disentanglement and cross-modal alignment

License: MIT License

Python 100.00%

alignsts's Introduction

AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment

Ruiqi Li, Rongjie Huang, Lichao Zhang, Jinglin Liu, Zhou Zhao | Zhejiang University

PyTorch Implementation of AlignSTS (ACL 2023): a speech-to-singing (STS) model based on modality disentanglement and cross-modal alignment.

arXiv

We provide our implementation and pretrained models in this repository.

Visit our demo page for audio samples.

News

  • May, 2023: AlignSTS released at Github.
  • May, 2023: AlignSTS accepted at ACL 2023 Findings.

Quick Start

We provide an example of how you can generate high-quality samples using AlignSTS.

Pretrained Models

You can use pretrained models we provide in here. Details of each folder are as in follows:

Model Discription
AlignSTS Acousitic model (config)
HIFI-GAN Neural Vocoder

Dependencies

A suitable conda environment named alignsts can be created and activated with:

conda create -n alignsts python=3.8
conda install --yes --file requirements.txt
conda activate alignsts

Test samples

We provide a mini-set of test samples to demonstrate AlignSTS in here. Specifically, we provide samples of WAV format combining the corresponding statistical files which is for faster IO. Please download the statistical files at data/binary/speech2singing-testdata/, while the WAV files are for listening.

FYI, the naming rule of the WAV files is [spk]#[song name]#[speech/sing identifier]#[sentence index].wav. For example, a sample named 男3号#all we know#sing#14.wav means a singing sample of song "all we know" from the 14th sentence, sung by the speaker "男3号".

Inference

Here we provide a speech-to-singing conversion pipeline using AlignSTS.

  1. Prepare AlignSTS (acoustic model): Download and put checkpoint at checkpoints/alignsts
  2. Prepare HIFI-GAN (neural vocoder): Download and put checkpoint at checkpoints/hifigan
  3. Prepare dataset (test dataset): Download the statistical files of the test dataset at data/binary/speech2singing-testdata
  4. Run
CUDA_VISIBLE_DEVICES=0 python tasks/run.py --exp_name alignsts --infer --hparams "gen_dir_name=test" --config configs/singing/speech2singing/alignsts.yaml --reset
  1. You will find outputs in checkpoints/alignsts/generated_200000_test, where [G] indicates ground truth mel results and [P] indicates predicted results.

Acknowledgements

This implementation uses parts of the code from the following Github repos: NATSpeech, DiffSinger, ProDiff, SpeechSplit2 as described in our code.

Citations

If you find this code useful in your research, please cite our work:

@article{li2023alignsts,
  title={AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment},
  author={Li, Ruiqi and Huang, Rongjie and Zhanag, Lichao and Liu, Jinglin and Zhao, Zhou},
  journal={Association for Computational Linguistics},
  year={2023}
}

Disclaimer

Any organization or individual is prohibited from using any technology mentioned in this paper to generate someone's speech/singing without his/her consent, including but not limited to government leaders, political figures, and celebrities. If you do not comply with this item, you could be in violation of copyright laws.

alignsts's People

Contributors

rickyl-2000 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

alignsts's Issues

Dependencies Conflict

I would like to express my appreciation for the hard work and effort put into this project!

However, I am encountering difficulties when trying to install the dependencies as outlined in the README file. Additionally, I receive a "PackagesNotFoundError" with the following message: "The following packages are not available from current channels: [list the packages]." The current channels I have are:

https://repo.anaconda.com/pkgs/main/linux-64
https://repo.anaconda.com/pkgs/main/noarch
https://repo.anaconda.com/pkgs/r/linux-64
https://repo.anaconda.com/pkgs/r/noarch

When using "pip install", dependency conflicts arise, preventing me from completing the installation process and proceeding with the project. These conflicts prevent the successful installation of the dependencies and impact the functionality of the project.

My operating system is Ubuntu 18.04. I kindly request assistance in resolving these issues so that I can successfully install the dependencies and utilize the project as intended. Thank you for your attention and support. Your efforts are greatly appreciated.

train my own model

hello , first of all thanks a lot ,i was playing with it and the results are awesome.
is there any way for me to train my own model based on my dataset.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.