GithubHelp home page GithubHelp logo

pydercalc's Introduction

pyDERCalc

This repository contains Python script for evaluating the performance of speaker diarization systems in the pyDERCalc.py file along with a demo Jupyter notebook and example files. It is designed to (a) be give more flexibility on forgiveness collars to be inserted, and anticipates potentially different utterance start collar sizes and utterance end collar sizes, and (b) enable variations to the code to be made more easily to test different things.

pyDERCalc does not permit segments to be excluded (i.e. no unpartitioned evaluations maps (UEM) files), but can otherwise be used to replicate the industry standard diarization evaluation tool md-eval.pl (Version 22). That code is part of the NIST scoring toolkit (sctk-2.4.10) available at ftp://jaguar.ncsl.nist.gov/pub/sctk-2.4.10-20151007-1312.tar.bz2, and also set out in this repository for easy comparison. It is described in detail in NIST, “The 2009 (RT-09) rich transcription meeting recognition evaluation plan,” Feb. 2009.

The example ground truth segmentation file AMI_20050204-1206_GroundTruth.rttm was used in one of the NIST Rich Transcription challenges.

The example speaker diarization system segmentation file AMI_20050204-1206_DiarTkOutput.rttm was generated using DiarTk based on the Mel-frequency cepstral coefficients (MFCCs) file AMI_20050204-1206.fea available at https://github.com/idiap/IBDiarization. DiarTk is described in D. Vijayasenan and F. Valente, “DiarTk: An open source toolkit for research in multistream speaker diarization and its application to meetings recordings,” in Proc. Conf. of Int. Speech Commun. Assoc. (INTERSPEECH), 2012, pp. 2170–2173.

Citation

If using pyDERCalc, please use the following citation: https://ieeexplore.ieee.org/document/9287552

@INPROCEEDINGS{McKnight2020,
  author={McKnight, Simon W. and Hogg, Aidan O. T. and Naylor, Patrick A.},
  booktitle={2020 28th European Signal Processing Conference (EUSIPCO)}, 
  title={Analysis of Phonetic Dependence of Segmentation Errors in Speaker Diarization}, 
  year={2021},
  volume={},
  number={},
  pages={381-385},
  doi={10.23919/Eusipco47968.2020.9287552}}

Requirements

  • Python 3
  • NumPy
  • Pandas

How To Use

See demo.ipynb notebook for examples on how to use pyDERCalc. The main steps are:

  • First import pyDERCalc, which may require the path to be added if saved to a different folder from where the code is run.
  • Define the path + filenames of the ground truth segmentation file and the diarization system segmentation file. They are called oracleRttmFile and diarized
  • Use mapSpkrs, dfErrors, _ = pyDERCalc.getAllErrors(oracleRttmFile, diarizedRttmFile, collars) to go straight to the dictionary of mapped speakers and the dataframe table of errors.
  • Other commands can be used and modified for individual steps of the process if so desired.

Explanation

The .rttm files are were devised by NIST for the Rich Transcription challenges that ran from 2002 to 2009. RTTM stands for Rich Transcription Time-marked Files, and are essentially text files setting out each speaker segment on one line. Each line has either 9 or 10 space-separated entries, which are put into a Python list using getSegs(rttmFile).

Start of AMI_20050204-1206_GroundTruth.rttm: "SPEAKER AMI_20050204-1206 1 1270.390 4.490 FEE029 SPEAKER AMI_20050204-1206 1 1275.195 3.070 FEE029 ..."

Start of AMI_20050204-1206_DiarTkOutput.rttm: "SPEAKER AMI_20050204-1206 1 1270.39 4.50 AMI_20050204-1206_spkr_0 SPEAKER AMI_20050204-1206 1 1275.19 3.08 AMI_20050204-1206_spkr_0 ..."

The functions getSplitSegs(segs) and iterateSplitSegs(segs) are used to find non-overlapping segments that have the same speakers throughout each segment.

The functions getComboSplitSegs(segs) and iterateComboSplitSegs(comboSegs) are used to combine the ground truth and diarization system segments to show what speakers are predicted for specific segments by the ground truth files and the diarization system files.

A number of important things can then be obtained from the combined ground truth and diarization system segments comboSplitSegs. The functions getOracleSpkrs(comboSplitSegs) and getDiarizedSpkrs(comboSplitSegs) will obtain the lists of ground truth speakers lstOracleSpkrs and diarization system speakers lstDiarizationSpkrs respectively, and they can all be inserted into getSpkrTimes(lstOracleSpkrs, lstDiarizedSpkrs, comboSplitSegs) to get a dataframe that describes the aggregate ground truth speaker times that map to the diarization speaker times. The dataframe would look like:

spkr_0 spkr_1 spkr_3 spkr_5 spkr_6 spkr_9
FEE029 194.480 13.925 12.185 14.040 2.610 7.235
FEE030 9.920 9.110 4.465 67.645 1.760 7.745
MEE031 6.445 0.570 106.655 5.105 0.000 5.020
FEE032 9.655 1.445 4.985 7.280 0.815 138.520

The mapping of ground truth speakers to diarization system speakers using getMapSpkrs(lstOracleSpkrs, dfSpkrTimes) would look at this dataframe and match the highest values to work out which ground truth speaker should be matched to which diarization system speaker. The resulting dictionary mapSpkrs also shows the aggregate time and percentage of the ground truth speaker time that is mapped to the relevant diarization system speaker:

{'FEE029': ['AMI_20050204-1206_spkr_0', 194.48, 79.6],
'FEE030': ['AMI_20050204-1206_spkr_5', 67.645, 67.2],
'MEE031': ['AMI_20050204-1206_spkr_3', 106.655, 86.2],
'FEE032': ['AMI_20050204-1206_spkr_9', 138.52, 85.1]}

So far, the collar sizes have not been used. The next step would be to take the input collar sizes, evaluate the segment times to ignore using getSegsIgnore(oracleSplitSegs, collars) and iterating in getNewSegsIgnore(segsIgnore, collars) to remove overlaps. The function getRevisedComboSplitSegs(comboSplitSegs, newSegsIgnore) would then remove the segment times to ignore from comboSplitSegs.

There are a number of functions that do evaluation. The starting function getTotalTime(segs, countMultipleSpkrs=True) will calculate the overall time if countMultipleSpeakers is set to False, but will double count overlapping time if True. Then the functions getMissedTime(segs), getFalarmTime(segs) and getErrorTime(segs, mapSpkrs) calculate MISS, FALARM and ERROR respectively, before they are all returned at the end using getErrors(oracleRttmFile, diarizedRttmFile, collars). An example of the resulting dataframe is:

MISS FALARM ERROR DER
Collar [+ms, -ms]
[0, 0] 20.15 0.91 8.09 29.14
[50, 50] 18.40 0.46 7.63 26.48
[100, 100] 16.86 0.32 7.16 24.34
[150, 150] 15.46 0.28 6.72 22.45
[200, 200] 14.16 0.24 6.37 20.78
[250, 250] 13.12 0.22 6.05 19.39

There are more unusual functions too, such as getCollarSegs(comboSplitSegs, newSegsIgnore) and getSBDERs(allCollarSegs, mapSpkrs, collars) that can calculate the errors in the collars rather than outside.

The function getAllErrors(oracleRttmFile, diarizedRttmFile, collars) runs the whole process with the least code.

pydercalc's People

Contributors

ahogg avatar swm1718 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.