Comments (6)
I have this same question, so far in my testing original Whisper is more accurate...
from whisperx.
similary questions. Whisperx even could not return word-level timestamps for the thailand language, it only return the sentence-level timestamps. I am very confused now. Any idea?
from whisperx.
@MengHao666 , Did you use alignment after transcribing your file? Alignment returns the word-level timestamps but I have only tested it for English. You might want to give this a try for Thai.
audio = whisperx.load_audio(filepath)
result = whisper_model.transcribe(audio)
device = "cuda" if torch.cuda.is_available() else "cpu"
result_aligned = whisperx.align(result["segments"], models["Align_model"], models["Align_metadata"], audio, device, return_char_alignments=False)
from whisperx.
@MengHao666 , Did you use alignment after transcribing your file? Alignment returns the word-level timestamps but I have only tested it for English. You might want to give this a try for Thai.
audio = whisperx.load_audio(filepath) result = whisper_model.transcribe(audio) device = "cuda" if torch.cuda.is_available() else "cpu" result_aligned = whisperx.align(result["segments"], models["Align_model"], models["Align_metadata"], audio, device, return_char_alignments=False)
I tried, it just failed. It could noit return word-level timestamps. I guess whisperx could not find proper thai language wav-vec2 pretrained model. I have tried to find some in the hugging face. But it failed also.
from whisperx.
the native whisper is the best for ASR, but without accurate timestamp
from whisperx.
I just made a loop to format openai/wisper timestamps similar to wisperX output which worked for me, also did string manipulation to append and prepend the punctuations to make it similar to fastwisper output.
from whisperx.
Related Issues (20)
- RuntimeError: Library libcublas.so.12 is not found or cannot be loaded HOT 3
- Word Level Transcripts Error HOT 8
- Transcription fails with diarization enabled
- Some transcriptions missing properties HOT 1
- RuntimeError: parallel_for failed: cudaErrorNoKernelImageForDevice: no kernel image is available for execution on the device HOT 5
- Getting no audio found error HOT 1
- whisperx.load_model & default_asr_options Error in Colab HOT 4
- Doesn't accept num_speakers as argument HOT 3
- whisperx.align has empty word intervals for numbers HOT 1
- Error While Using Machine With Only CPU (EC2 Instance) HOT 3
- No speaker labels in txt format with diarization enabled HOT 4
- Support for vulkan (intel arc gpu)
- IGNORE
- Diarization precision - is there way to improve it? HOT 4
- torchaudio._backend.set_audio_backend has been deprecated. HOT 3
- Probability or score coming from faster-whisper and not alignment model
- Timing of subtitles is way off if I limit max_line_count and max_line_width==bad things? HOT 3
- TypeError: TranscriptionOptions.__new__() got an unexpected keyword argument 'hotwords' HOT 2
- Load Model To CPU and Then GPU HOT 1
- My timestamps with whisperX are way off HOT 13
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from whisperx.