The speaker_embedding_ge2e_loss from gkv856

speaker_embedding_ge2e_loss's Introduction

This is a complete implementation of 'Generalized End-to-End loss for speaker verification (GE2E Loss)'

Paper used (implemeneted) in this repo

Generalized End-to-End Loss for Speaker Verification
Please note:
- Data used here is very elementary.
- 10 speakers from librispeech_test-other and 4 additional speakers. Total 14 speakers.
Please prepare you own data to train the model for your usecase.
- Since the model is trained on 14 people's voice, it can only identify those 14 people.
- If you wish, you could collect sample voices for 'N' different people and retrain the model to be able to identify those 'N' voices.

Details and usage

For a given utterance (audio) of a speaker, this model produces a vector of length 256 (1x256).
Therefore, a batch of inference might look something like below

      [-0.0205,  0.0271,  0.0419,  ..., -0.1035,  0.0387,  0.0905],
      [-0.0078,  0.0203,  0.0275,  ..., -0.0943,  0.0301,  0.0814],
      ...,
      [ 0.0220,  0.0717, -0.0553,  ...,  0.1109, -0.1149,  0.0084],
      [ 0.0164,  0.0770, -0.0502,  ...,  0.1053, -0.1108,  0.0152],
      [ 0.0287,  0.0664, -0.0607,  ...,  0.1179, -0.1123,  0.0064]],
     grad_fn=<DivBackward0>)

embeddings.shape -> torch.Size([32, 256])

The output produced by this model can be used to
- Voice detection
- Identify different speakers
- Voice cloning
- High fidelity voice generation
- etc.

Dependencies

Python 3
Numpy
PyTorch
librosa

Pre-trained model

Embedding Model (GE2E)
Pre-trained embedding model

Steps to train and use the embdding model**

Gather audio data -> different utterances from different people.
Create spectrogram of those audios.
Train the Speaker Embedding model.
Use the code from step1_train_embedding_model.py
- Results Step 1 (Embedding model using GE2E loss)
  - With 4 Speakers
  - With 6 Speakers
  - With 10 Speakers

Inspired from following github repo (BIG THANKS)

License

speaker_embedding_ge2e_loss's People

Contributors

Stargazers

Watchers

speaker_embedding_ge2e_loss's Issues

Error on running train_embedding_model.py

File "train_embedding_model.py", line 3, in
from embedding_model_GE2E.s0_audio_to_spectrogram import CreateSpectrogram
File "D:\GITHUB PROJECTS\speaker_embedding_GE2E_loss\embedding_model_GE2E\s0_audio_to_spectrogram.py", line 9, in
from AVC.utils.audio_utils import AudioUtils
ModuleNotFoundError: No module named 'AVC'

the AVC module is missing in the repository

Recommend Projects