russellsb / tt-vae-gan Goto Github PK
View Code? Open in Web Editor NEWTimbre transfer with variational autoencoding and cycle-consistent adversarial networks. Able to transfer the timbre of an audio source to that of another.
Timbre transfer with variational autoencoding and cycle-consistent adversarial networks. Able to transfer the timbre of an audio source to that of another.
Hi Russell,
I´m studying for months how to control Voice Style Transfer, but it´s too difficult for me.
Maybe you can help me to transfer 2 voices over another to fulfill a long youth-wish.
It´s a ZIP-file of 5 MB with the WAV-files of the voices.
You don´t have to do it for nothing, we will agree on a price.
My email is [email protected]
Kind regards,
Berend.
VOICES.zip
Hello! Your excellent work has benefited me a lot in the introduction of timbre conversion, but I don't understand one thing well: when you calculate KLD, put the square of each potential variable and then get the average value as the loss of KLD. I haven't understood it here. Could you explain it? Thank you very much!
Hello,
Congratulations, and thank you for sharing this very interesting project. We are trying to run this project, and have completed the data preparation and preprocessing steps. But at the training step we run into an issue. We have 2 speakers and are otherwise using the default settings. The printout from the run is pasted below.
Namespace(b1=0.5, b2=0.999, batch_size=4, channels=1, checkpoint_interval=1, dataset='../data/data_urmp/', decay_epoch=50, dim=32, epoch=0, img_height=128, img_width=128, lr=0.0001, model_name='test', n_cpu=8, n_downsample=2, n_epochs=100, n_spkrs=2, plot_interval=1)
2 2
Cuda found.
/path_cropped/venv/pytorch/lib/python3.8/site-packages/torch/optim/adam.py:48: UserWarning: optimizer contains a parameter group with duplicate parameters; in future, this will cause an error; see github.com/pytorch/pytorch/issues/40967 for more information
super(Adam, self).__init__(params, defaults)
0%| | 0/2136 [00:00<?, ?it/s]
Traceback (most recent call last):
File "train.py", line 287, in <module>
train_global()
File "train.py", line 267, in train_global
losses = train_local(i, epoch, batch, pair[0], pair[1], losses)
File "train.py", line 151, in train_local
X1 = Variable(batch[id_1].type(Tensor))
KeyError: 2
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/usr/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/path_cropped/venv/pytorch/lib/python3.8/site-packages/torch/utils/data/_utils/pin_memory.py", line 28, in _pin_memory_loop
r = in_queue.get(timeout=MP_STATUS_CHECK_INTERVAL)
File "/usr/lib/python3.8/multiprocessing/queues.py", line 116, in get
return _ForkingPickler.loads(res)
File "/path_cropped/venv/pytorch/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 289, in rebuild_storage_fd
fd = df.detach()
File "/usr/lib/python3.8/multiprocessing/resource_sharer.py", line 58, in detach
return reduction.recv_handle(conn)
File "/usr/lib/python3.8/multiprocessing/reduction.py", line 189, in recv_handle
return recvfds(s, 1)[0]
File "/usr/lib/python3.8/multiprocessing/reduction.py", line 157, in recvfds
msg, ancdata, flags, addr = sock.recvmsg(1, socket.CMSG_SPACE(bytes_size))
ConnectionResetError: [Errno 104] Connection reset by peer
The machine we use has an RTX 3080 GPU, and a CPU with 14 cores / 28 threads.
We have also tried to decrease the number of cpu's to 1 and 2, for example, and the error is similar, but shorter:
Namespace(b1=0.5, b2=0.999, batch_size=4, channels=1, checkpoint_interval=1, dataset='../data/data_urmp/', decay_epoch=50, dim=32, epoch=0, img_height=128, img_width=128, lr=0.0001, model_name='test', n_cpu=1, n_downsample=2, n_epochs=100, n_spkrs=2, plot_interval=1)
2 2
Cuda found.
/path_cropped/venv/pytorch/lib/python3.8/site-packages/torch/optim/adam.py:48: UserWarning: optimizer contains a parameter group with duplicate parameters; in future, this will cause an error; see github.com/pytorch/pytorch/issues/40967 for more information
super(Adam, self).__init__(params, defaults)
0%| | 0/2136 [00:00<?, ?it/s]
Traceback (most recent call last):
File "train.py", line 287, in <module>
train_global()
File "train.py", line 267, in train_global
losses = train_local(i, epoch, batch, pair[0], pair[1], losses)
File "train.py", line 151, in train_local
X1 = Variable(batch[id_1].type(Tensor))
KeyError: 2
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
Just in case, we have also reduced the batch size to 1 and 2, and then it stops at KeyError: 2
already.
Can you, or someone else, please help with that the problem is and how to resolve it?
General question, not an issue, apologies if this is the wrong place for such queries. I was wondering a couple things:
In general, are there more / less promising ways to create better results? Many of the voice conversions I've tried via this repo have had strange artifacts. Even in the core VAE-GAN demo, I'd say (subjectively) that the male=>female conversions sound a lot better than the female=>male, with the latter having lots of warbled speech. Maybe too broad a question, but based on your experience I'd be curious how you'd go about improving this? E.g. are there specific hyperparameters you'd change, and/or is it due to nature of the training data, etc etc?
How good have you found MelGAN vs WaveNet? I'm wondering whether to dive more into training WaveNet or not given what appears to be MelGAN's speed benefits (both training and inference). And along the lines of MelGAN, I'm curious whether you've found any pretrained models (whether the implementation you link or the official one) that you think are good enough or whether you typically train MelGAN yourself.
Appreciate any thoughts here.
This is a great project, congrats!
I have been trying to get it to run and so far so good, I get to the "Infer with VAE-GAN and Griffin Lim audio reconstruction" step. Using the Flickr dataset applied to the speakers 4 and 7.
My output is quite a low quality spectrogram and almost unintelligible Griffin Lim reconstruction. I was wondering whether I should change some of the training parameters to increase resolution, whether I am doing something wrong or whether this is expected and WaveNet will fix this?
I have a feeling I am doing something wrong. There is a clear correlation between input and output plots but the resolution of the output just seems to smalle. If you can point me into the right direction I would be extremely grateful.
Attached the Mel Spectrogram after step 1.4. It seems quite wrong.
Hello, Thank you for your great implementation work!
I have a quick question about converting from mel-spectrogram to wav.
I have checked that you used librosa library to convert frequency domain to time domain.
Have you tried to use any other libraries such as torchaudio which support GPU? because it takes so long time to convert mel-spectrograms to wav...
Thank you!
It would be desirable to be able to load a saved checkpoint to resume training; help with this would be welcome.
Hi,
Great work on this, I am trying to replicate it on my local machine but I am having some issue when training the model, could you please advise what might cause this error?
Traceback (most recent call last):
File "..\tt-vae-gan\voice_conversion\src\train.py", line 275, in
train_global()
File "..\tt-vae-gan\voice_conversion\src\train.py", line 255, in train_global
losses = train_local(i, epoch, batch, pair[0], pair[1], losses)
File "..\tt-vae-gan\voice_conversion\src\train.py", line 139, in train_local
X1 = Variable(batch[id_1].type(Tensor))
KeyError: 2
I have tried to play with n_epochs but it seems to fail at the very first one as shown below:
Namespace(epoch=0, n_epochs=2, model_name='test_1', dataset='../data/data_flickr', n_spkrs=4, batch_size=4, lr=0.0001, b1=0.5, b2=0.999, decay_epoch=1, n_cpu=6, img_height=128, img_width=128, channels=1, plot_interval=1, checkpoint_interval=2, n_downsample=2, dim=32)
..\ttvaegan\lib\site-packages\torch\optim\adam.py:48: UserWarning: optimizer contains a parameter group with duplicate parameters; in future, this will cause an error; see github.com/pytorch/pytorch/issues/40967 for more information
I am running this with an NVIDIA RTX 3000 with 6gb dedicated. Can it be an hardware limitation? It does fail exactly when the GPU reaches around 6gb
Best,
Hi, two fairly minor questions.
spk="[name]_[id]" ./run.sh --stage 1 --stop-stage 1
Do I need to also pass hparams (à la step 2.3)? I seem to get an unbound variable if not, and I assume this is the reason.
FileNotFoundError: [Errno 2] No such file or directory: 'exp/flickr_1_train_no_dev_flickr/checkpoint_latest.pth'
Is this because wavenet_vocoder/egs/gaussian/run.sh passes the --checkpoint=${expdir}/checkpoint_latest.pth
argument to train.py even though (if this is a fresh model run) there wouldn't be any latest checkpoint saved? If I edit out that arg from that line, the training at least starts.
Hey great work! Just wondering any chance you would have the loss evolution for your pretrained models?
Hi @RussellSB
there was a Voice Conversion Challenge 2020 baseline: CycleVAE w/ PWG vocoder
Official homepage: http://www.vc-challenge.org/
some code was provided by @bigpon / perhaps it could help the training / getting started stuff.
https://github.com/bigpon/vcc20_baseline_cyclevae/tree/master/baseline
https://github.com/bigpon/vcc20_baseline_cyclevae/blob/master/baseline/src/parallel_wavegan/models/melgan.py
In the paper it mentions Google's Parratron
this was implemented by @fd873630 / his models are here ( could this help?)
https://github.com/fd873630/Parrotron/tree/master/models
I wonder if a bit of detective work can piece things together to avoid the mode collapse.
Otherwise need to wait for @ebadawy to release code.
hi, i'm trying to reproduce your tutorial with pretrained models, but there is a problem with outputting files from the wavenet - after starting infer.sh I get files 1 second long, please tell me what i am doing wrong and how can i get fully processed files?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.