Comments (5)
@jbecke Thank you for your kind words. Have you tried the batched inference mode with WaveRNN? That should be around realtime on a decent GPU (note that audio quality suffers a little bit - play around with target and overlaps settings if you're interested in tinkering with it).
You can change the default setting for it in hparams.voc_gen_batched.
from wavernn.
Thanks! But unfortunately, in my use-case I do not have the whole spectrogram before vocoding i.e. chunking and vocoding the spectrogram in parallel will not work. In my application the target spectrogram is being synthesized in real time, then I need to vocode that in real time as well.
I suppose I should implement and train the reference WaveRNN instead of this modified model? Or do you think FFTNet would be better?
Also, were you able to match the paper's results using the reference WaveRNN model?
from wavernn.
@jbecke Well it's up to you I guess. All I'll say is that the spectrogram generation + wavernn vocoding in this repo is pretty damn fast and combined is often faster than realtime on my trusty old 1080GTX. I'm just wondering - why are you chunking the spectrogram in your application? I've found tacotron models to be pretty fast at generating output.
I haven't had great results with the exact model described in the WaveRNN paper. Keep in mind though that that model was incomplete since they didn't describe the conditioning network in any detail. Other open source attempts at recreating it haven't convinced me either from a sound quality point of view - it'll work ok on ljspeech but sound terrible on lower register male voices.
from wavernn.
I'm working on real-time voice conversion. So it's not that I'm chunking the spectrogram but that inherently it is impossible to have the whole spectrogram (at time t only have spectrogram for [0,t]). Thanks for your advice! I'm going to try throwing a couple V100s at the issue and see if I can replicate their results...
from wavernn.
@jbecke Hi, may I ask what method or framework do you use to implement real-time voice conversion?
from wavernn.
Related Issues (20)
- RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)
- librosa.util.exceptions.ParameterError: Audio buffer is not finite everywhere
- TTS not generating output even after 900k steps of tacotron model HOT 4
- Infinite loop during create_training_data.py
- Slow tacotron training 1step/sec on AWS p3.2xlarge (Tesla V100) HOT 1
- Using wavernn pretrained model, loss stuck at 5.6
- Can I use pretrained models with different hparams settings?
- sentence long problem
- Train WaveRnn AttributeError HOT 5
- ValueError - gen_tacotron.py HOT 1
- Error During Computing Consensus Step HOT 1
- adding support for windows sapi5
- why do you minus 2 in preprocessing ?
- AttributeError: module 'librosa' has no attribute 'output' HOT 4
- data\\dataset.pkl isssue HOT 1
- [feature request] dynamic batch size during WaveRNN training depending on free/total GPU memory
- Tacotron to Onnx HOT 1
- Where is the audio file for which itis generating the text? HOT 2
- (Solved, but can be useful to someone) Problems getting the project working for the first time
- spectrogram (image_-to-wav HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from wavernn.