Light

hcy71o / sc-cnn Goto Github PK

View Code? Open in Web Editor NEW

37.0 4.0 6.0 2.71 MB

SC-CNN: Effective Speaker Conditioning Method for Zero-Shot Multi-Speaker Text-to-Speech Systems

License: MIT License

Python 97.05% Jupyter Notebook 2.95%

acoustic-model feature-extractor multi-speaker-tts speech-synthesis text-to-speech tts zero-shot

sc-cnn's Introduction

SC-CNN : Effective Speaker Conditioning Method for Zero-Shot Multi-Speaker Text-to-Speech Systems

Thanks to StyleSpeech and VITS, we built up our codes based on Link and Link

VCTK dataset is used.
Sampling rate is set to 22050Hz.
This is the implementation of SC-TransferTTS

Materials

Prerequisites

Clone this repository.
Install python requirements. Please refer requirements.txt
1. You may need to install espeak first: apt-get install espeak
Download datasets
1. Download and extract the VCTK dataset, and downsample wav files to 22050 Hz. Then rename or create a link to the dataset folder: ln -s /path/to/VCTK-Corpus/downsampled_wavs DUMMY3
Build Monotonic Alignment Search and run preprocessing if you use your own datasets.

# Cython-version Monotonoic Alignment Search
cd monotonic_align
python setup.py build_ext --inplace

Training Exmaple

python train.py -c configs/vctk_base.json -m vctk_base

Inference Example

See inference.ipynb

sc-cnn's People

Contributors

Stargazers

Watchers

Forkers

shaun95 ishine achyun maxmax2016 wendongj whitefu

sc-cnn's Issues

About additional loss

Hello, nice work. I have a question.
Q) how about adding an extra loss at the end of generation to match the spk_enc of reference wav and generated wav? Because I do not see meta-stylespeech's discriminator being used here? (am i missing it somewhere?)

....
s_ref = self.spk_enc(y.transpose(1,2), (y_mask==0).squeeze(1))
....
## freeze spk_enc
s_out = self.spk_enc(y_out.transpose(1,2), (y_out_mask==0).squeeze(1))
# then cosine dist b/w s_ref and s_out
## unfreeze spk_enc

Thanks.

A problem about the kl loss

Hi, I tried to run the code, but the kl_loss is always infinity.

Acoustic feature transfering question

Hello, I'm new to this filed and still a little bit confused, I got few questions about the acoustic feature transfer.

In my understanding, this zero shot TTS transfer the speaker's voice from reference audio to do the synthesis, I'm wondering will it also transfer the speech style such as pitch, intonation, speaking rate, rhythm, volume, or emotional expression?
If this zero shot TTS does not transfer speech style, do you have any suggestion for doing style control or provide any kind of controllability base on VITS model?

I understand these questions are not that relevant to the repo issue, but any kind of help is appreciated, thank you!!

about the paper

Do you have published the paper?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs

Jooble