CRSS-SpkDiar is a C++ based speaker diarization toolkit, built on top of famous open source speech recognition platform of Kaldi. The main objectives of this toolkit are:
- Simple integration with Kaldi ASR,
- Simple intergration of i-vector modules within Kaldi for Diarization,
- Simple intergration of DNN modules within Kaldi for Diarization,
- Perform speaker diarization unsupervised/supervised/semi-supervised fashion,
- Benchmark on open database (AMI meeting corpus, and Apollo-MCC corpus).
Authors: Chengzhu Yu and Navid Shokouhi.
- VAD (GMM based)
- BIC segmentation (optional)
- Bottom-Up Clustering
- BIC distance
- KL divergence
- i-vector cosine distance
- i-vector Mahalanobis
- i-vector PLDA (optional)
- Bottom-Up Clustering Using i-vector cosine distance score (CDS)
- Interger linear programming (ILP) Clustering
- VAD based segmentation (viterbi)
- Resementation
- Evaluations
- DNN speaker embedding features
- Interface with Kaldi ASR
- Kaldi
- GLPK (if only you want to try ILP)
We evaluate our performance on AMI meeting corpus and compare the numbers with those reported in Pycasp from ICSI. Note: To evaluate only the clustering module, the numbers on CRSS-SpkDiar is on top of oracle segmentation. We're currently working to include segmentation.
Session | Pycasp | CRSS-SpkDiar (run2.sh) |
---|---|---|
IS1000a.Mix-Headset | 25.38 | 12.07 |
IS1001a.Mix-Headset | 32.34 | 43.64 |
IS1001b.Mix-Headset | 10.57 | 12.16 |
IS1001c.Mix-Headset | 28.40 | 6.17 |
IS1003b.Mix-Headset | 34.30 | 10.56 |
IS1003d.Mix-Headset | 50.75 | 24.67 |
IS1006b.Mix-Headset | 16.57 | 7.34 |
IS1006d.Mix-Headset | 53.05 | 21.56 |
IS1008a.Mix-Headset | 1.65 | 4.07 |
IS1008b.Mix-Headset | 8.58 | 3.60 |
IS1008c.Mix-Headset | 9.30 | 6.36 |
IS1008d.Mix-Headset | 26.27 | 5.99 |
Average | 24.76% | 13.19% |