Comments (10)
I added multi-speaker example in the notebook.
Please enjoy!
https://colab.research.google.com/github/espnet/notebook/blob/master/espnet2_tts_realtime_demo.ipynb
from espnet_model_zoo.
Hi. @ajbouh.
Since the feature extraction will be performed on-the-fly, you must provide wavform for speech
(e.g. torch.randn(50000,)
.
from espnet_model_zoo.
Thank you for the example! As I experiment with it, I see that the length of the supplied speech is very important. I also see that the output seems to be slower and the quality varies quite widely.
Here's a trimmed down example for two different reference speakers and some text (both reference and random): https://colab.research.google.com/drive/1FLcSi4fSqkxfwzfVRMwzPQgpGp-FOXpJ?usp=sharing
Do you have any advice for improving quality?
from espnet_model_zoo.
Hmm. I did not tune the hyperparameters so much for GST, especially the following parameters.
gst_heads
: the number of attention heads.gst_tokens
: the number of GST tokens.
https://github.com/espnet/espnet/blob/5aeb9926334125d63121e7a11764aa8ac21dee67/egs2/vctk/tts1/conf/tuning/train_gst_tacotron2.yaml#L34-L35
We need to tune these parameters to make it stable to reflect the speaker's characteristics.
If you can share the experimental results with us, that is great :)
from espnet_model_zoo.
What sorts of experimental results do you mean?
Do you want the samples I generated from the current model checkpoint or are you suggesting something else?
from espnet_model_zoo.
The behavior in the synthesis is also important.
And we need to tune the above hyper parameters.
So if you train the model with different parameters and report the effectiveness, that is great.
from espnet_model_zoo.
from espnet_model_zoo.
GST Tacotron2 takes 2 days on Titan V.
Of course, we support multi-gpu training.
from espnet_model_zoo.
I have a machine I can use for long term training with multiple 1080Ti cards. Hopefully that will be enough!
Which part of the documentation should I look at to see how to retrain GST Tacotron2 on the same dataset you've been working with?
from espnet_model_zoo.
Try https://github.com/espnet/espnet/tree/master/egs2/vctk/tts1
See the usage https://github.com/espnet/espnet/tree/master/egs2/TEMPLATE/tts1
from espnet_model_zoo.
Related Issues (20)
- Update PYPI HOT 3
- Is there a Mandarin multi-speaker pretrained model? HOT 1
- ASR demo multiple threads HOT 1
- Request for a default data folder for fallback HOT 2
- How to get the decoding result scores from HOT 4
- Is it possible to upload some pretrained models of tacotron2 for the libritts dataset? HOT 2
- Uploading ESPnet2 model to Zenodo HOT 6
- CSJ's pretrained conformer-based ASR model on zenodo HOT 1
- 'Speech2Text' has no attribute 'from_pretrained' HOT 6
- Using an original model trained in espnet1 HOT 6
- ModuleNotFoundError: No module named 'espnet_model_zoo.downloader'; 'espnet_model_zoo' is not a package HOT 1
- Missing getitem on huggingface page:
- TypeError: init() got an unexpected keyword argument 'train_config' HOT 1
- librosa.util.exceptions.ParameterError: Window size mismatch: 512 != 400 when using streaming transformer model HOT 2
- FileNotFoundError HOT 4
- Redundant ljspeech vits models HOT 3
- Huggingface downloader / cache, offline mode HOT 1
- I want to add an original model to "table.csv", but it says I don't have permission and I can't push.
- installing on mac silicon py 3.11.4 - sentencepiece building HOT 1
- Problem with very short and noisy audio during inference when providing xvector embeddings HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from espnet_model_zoo.