Speaker embeddings for Text-independent speaker verification using TensorFlow, with Kaldi

This is a slightly modified TensorFlow implementation of the model presented by David Snyder in Deep Neural Network Embeddings for Text-Independent Speaker Verification.

In the paper, this algorithm is a little worse than i-vector. My test show similar output. Also, in my test, shallow network was a very little worse than deep network (This is dependency of DB).

In this code, there are many hard cording such folder location and some parameter related database. If I have database well-known SR database, I try to it. but I only have private database.

I hope this code helps researcher.

Credits

Original paper:

Snyder's paper:

@unknown{unknown,
author = {Snyder, David and Garcia-Romero, Daniel and Povey, Daniel and Khudanpur, Sanjeev},
title = {Deep Neural Network Embeddings for Text-Independent Speaker Verification},
year = {2017}
}

Also, use the part of code:

mangate's git repository
- tensorflow classification baseline code
Karel Vesely's git repository
- kaldi io for python

Features

Supports kaldi input&output style(input : mfcc scp-ark pair, output : embedding scp-ark pair)
- This code can replace i-vector train - extraction part in kaldi egs/SRE10/v1.
Instead of concatenate VAD frame, I use orginal frame contain non-speech frame.
- Training case, Many frame was used to train. Test case, max power frame to test. Detail is in the process_data_kaldi.py load_dataset function
- This part depend on your opinion.
Adding input layer mean normalization instead of exptional block.
Adding some layer dropout and Batch normalized.
Adding L2 loss in last layer.

Requirements

Python (2.7)
NumPy
TensorFlow (I tried only 1.3 version)
Database

Usage

Preperation:

Clone the repository recursively to get all folder and subfolders
Prepare Database(I use private DB. If you need, the script needs to be modified)
Use Kaldi-recipe extracing MFCC and VAD in SRE10/v1/run.sh

Running:

run Training_kaldi function in make_dvec.py.
after, run embedding_kaldi function.(Some function was written hard cording. Change you file location)
use kaldi-recipe calculating mean vector and PLDA scoring.
Maybe, you only run after /local/extract_ivectors.sh --stage 2 each folder.

Authors

[email protected]( or [email protected])

runngezhang / sr_with_kaldi Goto Github PK

sr_with_kaldi's Introduction

Speaker embeddings for Text-independent speaker verification using TensorFlow, with Kaldi

Credits

Features

Requirements

Usage

Preperation:

Running:

Authors

sr_with_kaldi's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs