GithubHelp home page GithubHelp logo

hcy71o / sc-cnn Goto Github PK

View Code? Open in Web Editor NEW
37.0 4.0 6.0 2.71 MB

SC-CNN: Effective Speaker Conditioning Method for Zero-Shot Multi-Speaker Text-to-Speech Systems

License: MIT License

Python 97.05% Jupyter Notebook 2.95%
acoustic-model feature-extractor multi-speaker-tts speech-synthesis text-to-speech tts zero-shot

sc-cnn's Introduction

SC-CNN : Effective Speaker Conditioning Method for Zero-Shot Multi-Speaker Text-to-Speech Systems

Thanks to StyleSpeech and VITS, we built up our codes based on Link and Link

  1. VCTK dataset is used.
  2. Sampling rate is set to 22050Hz.
  3. This is the implementation of SC-TransferTTS

Materials

Prerequisites

  1. Clone this repository.
  2. Install python requirements. Please refer requirements.txt
    1. You may need to install espeak first: apt-get install espeak
  3. Download datasets
    1. Download and extract the VCTK dataset, and downsample wav files to 22050 Hz. Then rename or create a link to the dataset folder: ln -s /path/to/VCTK-Corpus/downsampled_wavs DUMMY3
  4. Build Monotonic Alignment Search and run preprocessing if you use your own datasets.
# Cython-version Monotonoic Alignment Search
cd monotonic_align
python setup.py build_ext --inplace

Training Exmaple

python train.py -c configs/vctk_base.json -m vctk_base

Inference Example

See inference.ipynb

sc-cnn's People

Contributors

hcy71o avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

sc-cnn's Issues

About additional loss

Hello, nice work. I have a question.
Q) how about adding an extra loss at the end of generation to match the spk_enc of reference wav and generated wav? Because I do not see meta-stylespeech's discriminator being used here? (am i missing it somewhere?)

....
s_ref = self.spk_enc(y.transpose(1,2), (y_mask==0).squeeze(1))
....
## freeze spk_enc
s_out = self.spk_enc(y_out.transpose(1,2), (y_out_mask==0).squeeze(1))
# then cosine dist b/w s_ref and s_out
## unfreeze spk_enc

Thanks.

Acoustic feature transfering question

Hello, I'm new to this filed and still a little bit confused, I got few questions about the acoustic feature transfer.

  1. In my understanding, this zero shot TTS transfer the speaker's voice from reference audio to do the synthesis, I'm wondering will it also transfer the speech style such as pitch, intonation, speaking rate, rhythm, volume, or emotional expression?
  2. If this zero shot TTS does not transfer speech style, do you have any suggestion for doing style control or provide any kind of controllability base on VITS model?

I understand these questions are not that relevant to the repo issue, but any kind of help is appreciated, thank you!!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.