GithubHelp home page GithubHelp logo

kandy22 / text-to-sound-synthesis Goto Github PK

View Code? Open in Web Editor NEW

This project forked from yangdongchao/text-to-sound-synthesis

0.0 0.0 0.0 52.63 MB

The source code of our paper "Diffsound: discrete diffusion model for text-to-sound generation"

Home Page: http://dongchaoyang.top/text-to-sound-synthesis-demo/

Shell 0.88% Python 99.12%

text-to-sound-synthesis's Introduction

Text-to-sound Generation

This is the open source code for our paper "Diffsound: discrete diffusion model for text-to-sound generateion".
You can find the paper on arxiv https://arxiv.org/pdf/2207.09983v1.pdf
The demo page is http://dongchaoyang.top/text-to-sound-synthesis-demo/
2022/08/03 We upload the training code of VQ-VAE and the baseline method of text-to-sound generation (Autoregressive model), and the Diffsound code. Considering that the github has the limitation of file size, we will upload the pre-trained model on google drive disk.
2022/08/06 We uppoad the pre-trained model on google drive. please refer to https://drive.google.com/drive/folders/193It90mEBDPoyLghn4kFzkugbkF_aC8v?usp=sharing
Note that a pre-trained diffsound model is very large, so that we only upload one audioset pretrained model now. More models we will try to upload on other free disk, if you known any free shared disk, please let me know, I will very appreciate.
2022/08/09 We upload trained diffsound model on audiocaps dataset, and the baseline AR model, and the codebook trained on audioset with the size of 512. You can refer to https://pan.baidu.com/s/1R9YYxECqa6Fj1t4qbdVvPQ . The password is lsyr
2022/12/06 Hi, everyone. In our previous setting, we use the wrong sample rate to load wav file, which results in the speech cannot be generated very well. Now, we update the feature extraction module. https://github.com/yangdongchao/Text-to-sound-Synthesis/blob/master/Codebook/feature_extraction/extract_mel_spectrogram.py#L167 . We will re-train our model, all of the pre-trained model can be found on PKU disk: https://disk.pku.edu.cn:443/link/4908743A441B02235C8652742FE44949 More details will be updated as soon as.

Overview

avatar

Pretrained Model

We release four text-to-sound pretrained model. Including VQVAE trained on Audioset, Vocoder trained on Audioset, generation model trained on Audiocaps and Audioset.

Inference

Please refer the readme.md file in Codebook folder to see how to inference.

Training

Please refer the readme.md file in Codebook folder to see how to train your network.

Reference

This project based on following open source code. https://github.com/XinhaoMei/ACT https://github.com/cientgu/VQ-Diffusion https://github.com/CompVis/taming-transformers https://github.com/lonePatient/Bert-Multi-Label-Text-Classification https://github.com/v-iashin/SpecVQGAN

Cite

@article{yang2022diffsound, title={Diffsound: Discrete Diffusion Model for Text-to-sound Generation}, author={Yang, Dongchao and Yu, Jianwei and Wang, Helin and Wang, Wen and Weng, Chao and Zou, Yuexian and Yu, Dong}, journal={arXiv e-prints}, pages={arXiv--2207}, year={2022} }

text-to-sound-synthesis's People

Contributors

yangdongchao avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.