I was playing with the preprocessing parameters and I was able to change a bit the sou

No problem! Well, there are two options: Voi

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

About Speaker Voice about universalvocoding HOT 4 CLOSED

bshall commented on May 27, 2024

About Speaker Voice

from universalvocoding.

Comments (4)

bshall commented on May 27, 2024 2

No problem!

Well, there are two options:

Voice cloning (as you mentioned) - where you synthesize speech from a specific voice from text.
Voice conversion - where you take audio from one speaker and directly convert it to a target speaker.

I think Real-Time-Voice-Cloning the best available open-source project for voice cloning. For voice conversion, there is https://github.com/liusongxiang/StarGAN-Voice-Conversion and https://github.com/auspicious3000/autovc for example.

Hope that helps!

from universalvocoding.

bshall commented on May 27, 2024

Hi @shoegazerstella,

It's fun to mess with the inputs but I think changing the speech characteristics in any systematic way is pretty difficult. I remember the issue in #3 was that changing num_fft resulted in a pitch shift. I think a more principled method would be vocal tract length perturbation (see "Vocal tract length perturbation (VTLP) improves speech recognition" for details). It's relatively easy to mess with the mel filters in librosa so that'd be a simple place to start.

Otherwise, if you're interested in changing the speaker entirely I've done some work on voice conversion here. There are also a bunch of papers/repos that convert the spectrogram directly and then synthesize with a vocoder (happy to suggest some if you're interested).

from universalvocoding.

shoegazerstella commented on May 27, 2024

if you're interested in changing the speaker entirely I've done some work on voice conversion here. There are also a bunch of papers/repos that convert the spectrogram directly and then synthesize with a vocoder (happy to suggest some if you're interested).

Exacly, my aim is to change the speaker entirely.

I was reading more on voice cloning and I did find these two works:

But if I understand well, your approach on voice conversion is a little bit different. I'll look more into it!
Would be awesome if you could suggest other approaches too!
Thanks a lot!

from universalvocoding.

shoegazerstella commented on May 27, 2024

So yes, the approaches are two indeed.
For the TTS part I was using an implementation of FastSpeech2 and to be honest I didn't want to change that because it's super fast in CPU.
So I might try both approaches and decide on both quality of results and speed.
Again thanks a lot! :)

from universalvocoding.

About Speaker Voice about universalvocoding HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs