Comments (4)
Ok, I figured out what I was doing wrong. I will leave the comment here in case someone has similar problem and will close the issue.
When sending to diarization, I was using segments created by the transcription process. Segments were too long (ie. 3-5 sentences), which meant that sometimes speakers were changing in between and the model took the one that was the most common in that segment. I have now changed and am sending segments created by the alignment process, where segments are much shorter and the result is much better.
from whisperx.
I have tried upgrading to Pyannote 3.1, and the problem persists. The alignment is pretty useless - even in a very controlled environment (ie. studio recording, BBC podcast, with 3 speakers), it is missing quiet a bit.
Anyone had success in making this better?
from whisperx.
@nikola1975 I am having the same issue, but your solution (the default code example in the README) doesn't solve it. Here's my code:
options = {
"max_new_tokens": None,
"clip_timestamps": None,
"hallucination_silence_threshold": None
}
model = whisperx.load_model("large-v3", device, compute_type=compute_type, download_root=model_dir, language=language, asr_options=options)
audio = whisperx.load_audio(file_path)
result = model.transcribe(audio, batch_size=batch_size, chunk_size=10, print_progress=True)
model_a, metadata = whisperx.load_align_model(language_code=result["language"], device=device)
result = whisperx.align(result["segments"], model_a, metadata, audio, device, return_char_alignments=False)
diarize_model = whisperx.DiarizationPipeline(use_auth_token=HF_TOKEN, device=device)
diarize_segments = diarize_model(audio, min_speakers=min_speakers)
result = whisperx.assign_word_speakers(diarize_segments, result)
from whisperx.
You are getting poor results from the diarization, or is it wrongly recognizing speakers? My results are not 100% precise now, but they are relatively close to it. I am not sure what are your expectations :)
I suppose you are using Pyannote 3.1 model? Try to run diarization through this link and check if you are getting the same results:
https://huggingface.co/spaces/pyannote/pretrained-pipelines
from whisperx.
Related Issues (20)
- Doesn't accept num_speakers as argument HOT 3
- whisperx.align has empty word intervals for numbers HOT 1
- Error While Using Machine With Only CPU (EC2 Instance) HOT 3
- No speaker labels in txt format with diarization enabled HOT 6
- Support for vulkan (intel arc gpu)
- IGNORE
- torchaudio._backend.set_audio_backend has been deprecated. HOT 3
- Probability or score coming from faster-whisper and not alignment model
- Timing of subtitles is way off if I limit max_line_count and max_line_width==bad things? HOT 3
- TypeError: TranscriptionOptions.__new__() got an unexpected keyword argument 'hotwords' HOT 2
- Load Model To CPU and Then GPU HOT 1
- My timestamps with whisperX are way off HOT 14
- Issue with Periods in Dates or numbers Causing Incorrect Segment Splitting in German Transcriptions
- Unable to Transcribe More Than 90 Minutes (1h30m) HOT 2
- Empty transcript is generated
- Benchmarks for whisperx, faster-whisper, and whispers2t! HOT 4
- Readability trashed after putting length limits.
- matrix contains invalid numeric entries
- Hallucinations in silent videos - need suggestions
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from whisperx.