Simplified diarization pipeline using some pretrained models.
Made to be a simple as possible to go from an input audio file to diarized segments.
import soundfile as sf
import matplotlib.pyplot as plt
from simple_diarizer.diarizer import Diarizer
from simple_diarizer.utils import combined_waveplot
diar = Diarizer(
embed_model='xvec', # 'xvec' and 'ecapa' supported
cluster_method='sc' # 'ahc' and 'sc' supported
)
segments = diar.diarize(WAV_FILE, num_speakers=NUM_SPEAKERS)
signal, fs = sf.read(WAV_FILE)
combined_waveplot(signal, fs, segments)
plt.show()
Simplified diarization is available on PyPI:
pip install simple-diarizer
"Some Quick Advice from Barack Obama!"
The following pretrained models are used:
- Voice Activity Detection (VAD)
- Deep speaker embedding extraction
- (Optional/Experimental) Speech-to-text
- ESPnet Model Zoo
- English ASR model
- ESPnet Model Zoo
It can be checked out in the above link, where it will try and diarize any input YouTube URL. It will also use YouTube's autogenerated transcriptions to produce a speaker labelled transcription.
Hopefully this can be of use as a free basic tool to produce a diarized transcript of a video/audio of interest.
- Spectral clustering methods lifted from https://github.com/wq2012/SpectralCluster