GithubHelp home page GithubHelp logo

xuchenglin28 / speaker_extraction Goto Github PK

View Code? Open in Web Editor NEW
145.0 8.0 30.0 2.39 MB

target speaker extraction and verification for multi-talker speech

License: GNU General Public License v3.0

Shell 10.16% MATLAB 7.28% Python 82.56%
speaker-extraction multi-talker-speech speaker-verification source-separation

speaker_extraction's Introduction

Target Speaker Extraction and Verification for Multi-talker Speech

The codes here are speaker extraction, where only target speaker's voice will be extracted given this target speaker's characteristics. In paper 2), we use a small network to jointly learn target speaker's characteristics from a different utterance of target speaker. You also can replace the network by using i-vector, or x-vector network.

If you are interested in speech separation to get all the speaker's voices in the mixture, please move to https://github.com/xuchenglin28/speech_separation

Papers

Please cite:

  1. Chenglin Xu, Wei Rao, Xiong Xiao, Eng Siong Chng and Haizhou Li, "SINGLE CHANNEL SPEECH SEPARATION WITH CONSTRAINED UTTERANCE LEVEL PERMUTATION INVARIANT TRAINING USING GRID LSTM", in Proc. of ICASSP 2018, pp 6-10.
  2. Chenglin Xu, Wei Rao, Eng Siong Chng, and Haizhou Li, "Optimization of Speaker Extraction Neural Network with Magnitude and Temporal Spectrum Approximation Loss", in Proc. of ICASSP 2019, pp 6990-6994.
  3. Wei Rao, Chenglin Xu, Eng Siong Chng, and Haizhou Li, "Target Speaker Extraction for Multi-Talker Speaker Verification", in Proc. of Interspeech 2019, pp.1273-1277.

Data Generation:

If you are using wsj0 to simulate data as in the paper 2) and 3), please read the code in run_data_generation.sh for detials, and change the path accordingly.

The list of files and SNRs for {training, development and test sets} are in simulation/mix_2_spk_{tr,cv,tt}_extr.txt. In the files, the first column is the utterance of the target speaker to generate mixture and also used as target clean to supervise the network learning. The seconde column is the interference speaker to generate the mixture. The third column is the taget speaker's another utterane to obtain speaker's characteristics.

After run the .sh script, there will be 3 folders {mix, aux, s1} for the three sets {tr, cv, tt}. The mix folder is the mixture speech, aux folder is the utterances to obtain speaker's characteristics, and s1 is the folder of target clean. In all three folders, the names are cosistent for each example.

Speaker Extraction

This part includes feature extraction, model training, run-time inference. Please read the run.sh code for detail and revise accordingly.

noisy_dir: the folder where your simulated data is. For example, "data/wsj0_2mix_extr/wav8k/max", under this path, there will be three folder for training, development and test sets (tr, cv, tt). In each set, there will be three folder with names of (mix, aux, s1) as described in Data Genration part.

(The folder name for training and development set has been hard code. If you want to use differnt forlder name, please change parameters in read_list() function in train.py)

After given the path to the noisy_dir, you can just run the code to extract feature, train model, and do run-time inference.

run.sh

If you want to repeat the results in the published paper, you need to set the dur=0 in run.sh. Because variant utterances are used without fixing the duration of the utterances.

Speaker Verification:

Here we only provide the key files for the paper 3) on speaker verification. Please read the paper for details.

verification/keys: key files of simulated trials for multi-talker speaker verification system.

Environments:

python: 2.7

Tensorflow: 1.12 (some API are older version, but compatiable by 1.12)

Contact

e-mail: [email protected]

Licence

The code and models in this repository are licensed under the GNU General Public License Version 3.

Citation

If you would like to cite, use this :

@inproceedings{xu2018single,
  title={Single channel speech separation with constrained utterance level permutation invariant training using grid lstm},
  author={Xu, Chenglin and Rao, Wei and Xiao, Xiong and Chng, Eng Siong and Li, Haizhou},
  booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={6--10},
  year={2018}
}
@inproceedings{xu2019optimization,
  title={Optimization of speaker extraction neural network with magnitude and temporal spectrum approximation loss},
  author={Xu, Chenglin and Rao, Wei and Chng, Eng Siong and Li, Haizhou},
  booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={6990--6994},
  year={2019}
}
@inproceedings{rao2019target,
  title={Target speaker extraction for multi-talker speaker verification},
  author={Rao, Wei and Xu, Chenglin and Chng, Eng Siong and Li, Haizhou},
  booktitle={Proc. Of INTERSPEECH},
  pages={1273--1277},
  year={2019}
}

speaker_extraction's People

Contributors

xuchenglin28 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

speaker_extraction's Issues

help

image

I can't find the folder and the config.py

ask for help

Recently, I have read your paper about MTSAL, and I am trying to re-implement your algorithm. But I cannot understand your cost function exactly. Could you give me your email address so that I can describe my confusion in detail?

The performance

Hello, SIR !
I wonder how many steps does the model need to be trained to get a good separated result? And why I cannot run the ‘run.sh’ script under GPU?

Checkout Failed

I'm getting this error whwn cloning the git rep.

git clone https://github.com/xuchenglin28/speaker_extraction.git
Cloning into 'speaker_extraction'...
remote: Enumerating objects: 111, done.
remote: Total 111 (delta 0), reused 0 (delta 0), pack-reused 111
Receiving objects: 100% (111/111), 2.38 MiB | 361.00 KiB/s, done.
Resolving deltas: 100% (31/31), done.
fatal: cannot create directory at 'data/wsj0_2mix_extr/wav8k/max/cv/aux': Invalid argument
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry with 'git restore --source=HEAD :/'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.