GithubHelp home page GithubHelp logo

About Speaker Voice about universalvocoding HOT 4 CLOSED

bshall avatar bshall commented on May 27, 2024
About Speaker Voice

from universalvocoding.

Comments (4)

bshall avatar bshall commented on May 27, 2024 2

No problem!

Well, there are two options:

  1. Voice cloning (as you mentioned) - where you synthesize speech from a specific voice from text.
  2. Voice conversion - where you take audio from one speaker and directly convert it to a target speaker.

I think Real-Time-Voice-Cloning the best available open-source project for voice cloning. For voice conversion, there is https://github.com/liusongxiang/StarGAN-Voice-Conversion and https://github.com/auspicious3000/autovc for example.

Hope that helps!

from universalvocoding.

bshall avatar bshall commented on May 27, 2024

Hi @shoegazerstella,

It's fun to mess with the inputs but I think changing the speech characteristics in any systematic way is pretty difficult. I remember the issue in #3 was that changing num_fft resulted in a pitch shift. I think a more principled method would be vocal tract length perturbation (see "Vocal tract length perturbation (VTLP) improves speech recognition" for details). It's relatively easy to mess with the mel filters in librosa so that'd be a simple place to start.

Otherwise, if you're interested in changing the speaker entirely I've done some work on voice conversion here. There are also a bunch of papers/repos that convert the spectrogram directly and then synthesize with a vocoder (happy to suggest some if you're interested).

from universalvocoding.

shoegazerstella avatar shoegazerstella commented on May 27, 2024

if you're interested in changing the speaker entirely I've done some work on voice conversion here. There are also a bunch of papers/repos that convert the spectrogram directly and then synthesize with a vocoder (happy to suggest some if you're interested).

Exacly, my aim is to change the speaker entirely.

I was reading more on voice cloning and I did find these two works:

But if I understand well, your approach on voice conversion is a little bit different. I'll look more into it!
Would be awesome if you could suggest other approaches too!
Thanks a lot!

from universalvocoding.

shoegazerstella avatar shoegazerstella commented on May 27, 2024

So yes, the approaches are two indeed.
For the TTS part I was using an implementation of FastSpeech2 and to be honest I didn't want to change that because it's super fast in CPU.
So I might try both approaches and decide on both quality of results and speed.
Again thanks a lot! :)

from universalvocoding.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.