GithubHelp home page GithubHelp logo

speaker_embedding_ge2e_loss's Introduction

This is a complete implementation of 'Generalized End-to-End loss for speaker verification (GE2E Loss)'

Python PyTorch NumPy Pandas PyCharm GitHub

Paper used (implemeneted) in this repo

  • Generalized End-to-End Loss for Speaker Verification
  • Please note:
    • Data used here is very elementary.
    • 10 speakers from librispeech_test-other and 4 additional speakers. Total 14 speakers.
  • Please prepare you own data to train the model for your usecase.
    • Since the model is trained on 14 people's voice, it can only identify those 14 people.
    • If you wish, you could collect sample voices for 'N' different people and retrain the model to be able to identify those 'N' voices.

Details and usage

  • For a given utterance (audio) of a speaker, this model produces a vector of length 256 (1x256).
  • Therefore, a batch of inference might look something like below
  •       [-0.0205,  0.0271,  0.0419,  ..., -0.1035,  0.0387,  0.0905],
          [-0.0078,  0.0203,  0.0275,  ..., -0.0943,  0.0301,  0.0814],
          ...,
          [ 0.0220,  0.0717, -0.0553,  ...,  0.1109, -0.1149,  0.0084],
          [ 0.0164,  0.0770, -0.0502,  ...,  0.1053, -0.1108,  0.0152],
          [ 0.0287,  0.0664, -0.0607,  ...,  0.1179, -0.1123,  0.0064]],
         grad_fn=<DivBackward0>)
    
    embeddings.shape -> torch.Size([32, 256])
    
    
  • The output produced by this model can be used to
    • Voice detection
    • Identify different speakers
    • Voice cloning
    • High fidelity voice generation
    • etc.

Dependencies

  • Python 3
  • Numpy
  • PyTorch
  • librosa

Pre-trained model

Embedding Model (GE2E)
Pre-trained embedding model

Steps to train and use the embdding model**

  • Gather audio data -> different utterances from different people.

  • Create spectrogram of those audios.

  • Train the Speaker Embedding model.

  • Use the code from step1_train_embedding_model.py

    • Results Step 1 (Embedding model using GE2E loss)
      • With 4 Speakers
      • With 6 Speakers
      • With 10 Speakers

    Speaker classification for 4 speakers

    Speaker classification for 4 speakers

    Speaker classification for 4 speakers

Inspired from following github repo (BIG THANKS)

License

Licence

speaker_embedding_ge2e_loss's People

Contributors

gkv856 avatar

Stargazers

tingweichen avatar  avatar  avatar  avatar

Watchers

 avatar

speaker_embedding_ge2e_loss's Issues

Error on running train_embedding_model.py

File "train_embedding_model.py", line 3, in
from embedding_model_GE2E.s0_audio_to_spectrogram import CreateSpectrogram
File "D:\GITHUB PROJECTS\speaker_embedding_GE2E_loss\embedding_model_GE2E\s0_audio_to_spectrogram.py", line 9, in
from AVC.utils.audio_utils import AudioUtils
ModuleNotFoundError: No module named 'AVC'

the AVC module is missing in the repository

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.