GithubHelp home page GithubHelp logo

runngezhang / pyannote-db-voxceleb Goto Github PK

View Code? Open in Web Editor NEW

This project forked from pyannote/pyannote-db-voxceleb

1.0 0.0 0.0 6.2 MB

VoxCeleb plugin for pyannote.database

Home Page: http://github.com/pyannote/pyannote-database

License: Other

Python 91.59% Jupyter Notebook 8.41%

pyannote-db-voxceleb's Introduction

VoxCeleb plugin for pyannote.database

This package provides an implementation of the speaker verification and speaker identification protocols used in the VoxCeleb paper.

Actual VGGVox models can be obtained from the authors of the original paper.

Citation

Please cite the following reference if your research relies on the VoxCeleb dataset:

@InProceedings{VoxCeleb,
  author = {Nagrani, A. and Chung, J.~S. and Zisserman, A.},
  title = {{VoxCeleb: a large-scale speaker identification dataset}},
  booktitle = {{Interspeech 2017, 18th Annual Conference of the International Speech Communication Association}},
  year = {2017},
  month = {August},
  address = {Stockholm, Sweden},
  url = {http://www.robots.ox.ac.uk/~vgg/data/voxceleb/},
}

Please cite the following references if your research relies on this package. This is where the whole pyannote.database framework was first introduced:

@inproceedings{pyannote.metrics,
  author = {Herv\'e Bredin},
  title = {{pyannote.metrics: a toolkit for reproducible evaluation, diagnostic, and error analysis of speaker diarization systems}},
  booktitle = {{Interspeech 2017, 18th Annual Conference of the International Speech Communication Association}},
  year = {2017},
  month = {August},
  address = {Stockholm, Sweden},
  url = {http://pyannote.github.io/pyannote-metrics},
}

Installation

$ pip install pyannote.db.voxceleb

Usage

Speaker verification protocol

>>> from pyannote.database import get_protocol
>>> protocol = get_protocol('VoxCeleb.SpeakerVerification.VoxCeleb1')

First, one can use protocol.train generator to train the background model.

>>> for training_file in protocol.train():
...
...    uri = training_file['uri']
...    print('Current filename is {0}.'.format(uri))
...
...    # "who speaks when" as a pyannote.core.Annotation instance
...    annotation = training_file['annotation']
...    for segment, _, speaker in annotation.itertracks(yield_label=True):
...        print('{0} speaks between t={1:.1f}s and t={2:.1f}s.'.format(
...            speaker, segment.start, segment.end))
...    
...    break  # this should obviously be replaced
...           # by the actual background training
Current filename is A.J._Buckley/1zcIwhmdeo4_0000001.
A.J._Buckley speaks between t=0.0s and t=8.1s.

Then, one should use protocol.test_enrolment generator to enrol speakers:

>>> models = {}  # dictionary meant to store all enrolments
>>> for enrolment in protocol.test_enrolment():
...
...    # unique model identifier
...    model_id = enrolment['model_id']
...
...    uri = enrolment['uri']
...    print('Current filename is {0}.'.format(uri))
...
...    # enrolment segment as a pyannote.core.Timeline instance
...    timeline = enrolment['enrol_with']
...    for segment in timeline:
...        print('Use speech between t={0:.1f}s and t={1:.1f}s for enrolment.'.format(segment.start, segment.end))
...    
...    # enrol_func should return the actual model
...    models[model_id] = enrol_func(uri, timeline)
...   
...    break  # one should obviously iterate over all enrolments
Current filename is Eartha_Kitt/x6uYqmx31kE_0000001.
Use speech between t=0.0s and t=5.7s for enrolment.

Finally, protocol.test_trial generator provides the list of trials:

>>> for trial in protocol.test_trial():
...
...    uri = trial['uri']
...    print('Current filename is {0}.'.format(uri))
...
...    # trial segment as a pyannote.core.Timeline instance
...    timeline = trial['try_with']
...    for segment in timeline:
...        print('Use speech between t={0:.1f}s and t={1:.1f}s for trial.'.format(segment.start, segment.end))
...
...    model_id = trial['model_id']
...    model = models[model_id]
...    print('Compare to model "{0}".'.format(model_id))
...
...    # True for target trials, False for non target trials
...    reference = trial['reference']
...    print('This is a {0} trial.'.format('target' if reference else 'non-target'))
...    
...    score = try_func(uri, segment, model)
...
...    break  # one should obviously iterate over all trials
Current filename is Eartha_Kitt/8jEAjG6SegY_0000008.
Use speech between t=0.0s and t=6.8s for trial.
Compare to model "Eartha_Kitt/x6uYqmx31kE_0000001".
This is a target trial.

This protocol implements the one described in the VoxCeleb paper:

For verification, all POIs whose name starts with an โ€˜Eโ€™ are reserved for testing, since this gives a good balance of male and female speakers. These POIs are not used for training the network, and are only used at test time.

Suprisingly, this protocol does not provide any development (aka validation) set. Therefore, we also propose a small variation of this protocol that keeps the same test set but reserves POIs whose name starts with 'U', 'V' or 'W' for validation (41 people, with a good male/female balance). It is called VoxCeleb.SpeakerVerification.VoxCeleb1_UVW.

Speaker identification protocol

The speaker identification protocol on VoxCeleb1 is initialized as follows:

>>> from pyannote.database import get_protocol
>>> protocol = get_protocol('VoxCeleb.SpeakerIdentification.VoxCeleb1')

First, one can use protocol.train generator to iterate over the training set:

>>> for training_file in protocol.train():
...
...    uri = training_file['uri']
...    print('Current filename is {0}.'.format(uri))
...
...    # "who speaks when" as a pyannote.core.Annotation instance
...    annotation = training_file['annotation']
...    for segment, _, speaker in annotation.itertracks(yield_label=True):
...        print('{0} speaks between t={1:.1f}s and t={2:.1f}s.'.format(
...            speaker, segment.start, segment.end))
...    
...    break  # this should obviously be replaced
...           # by the actual training
Current filename is A.J._Buckley/1zcIwhmdeo4_0000001.
A.J._Buckley speaks between t=0.0s and t=8.1s.

The test set can be iterated over using the protocol.test_trial generator:

>>> for trial in protocol.test_trial():
...
...    uri = trial['uri']
...    print('Current filename is {0}.'.format(uri))
...
...    # trial segment as a pyannote.core.Timeline instance
...    timeline = trial['try_with']
...    for segment in timeline:
...        print('Use speech between t={0:.1f}s and t={1:.1f}s for trial.'.format(segment.start, segment.end))
...
...    reference = trial['reference']
...    print('The expected output is "{0}".'.format(reference))
...
...    decision = try_func(uri, segment)
...
...    break  # one should obviously iterate over all trials
Current filename is A.J._Buckley/Y8hIVOBuels_0000001.
Use speech between t=0.0s and t=4.6s for trial.
The expected output is "A.J._Buckley".

A validation set is also available. One can simply replace test_trial by development_trial.

pyannote-db-voxceleb's People

Contributors

hbredin avatar

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.