Comments (5)
Hi @antiboredom,
You are welcome :) 'Glad you found it useful.
To achieve word-level timestamps, you will need to enable token_timestamps
and set max_len
to 1, like the following:
from pywhispercpp.model import Model
model = Model('base.en', n_threads=6)
words = model.transcribe('file.mp3', token_timestamps=True, max_len=1)
for word in words:
print(word.text)
from pywhispercpp.
Thank you! not sure why I was having trouble sorting that out myself!!
One more thing, and I'm not sure if this is just a whisper thing or related to your project, but I'm seeing one longer word being broken up. In my test case, "Enormous" is becoming "En", "orm", "ous". Any ideas why that might be happening?
from pywhispercpp.
it's a bit tricky to figure it out, as it is not an exact word-level timestamp per say, in fact you can set the max_len
to whatever number of chars you want, so when you set max_len to 1, every token will be in its own line, and it will give similar results to a word-level timestamps.
And I think this is the problem with your test case, it seems like "Enormous" is tokenized into 3 tokens, and you get every token by its own. Although, I've never get such a case!
Can you try for example to change the max_len to 8 for example ?
from pywhispercpp.
Interesting! When I try max_len
set to 8, I get "Enorm" and "ous", and then occasionally multiple words like "and if" appearing on the same line... I have also tried faster-whisper which does work as expected for word-level timestamps, but is significantly slower than your implementation...
from pywhispercpp.
You still get two separate words from "Enormous" even after max_len
set to 8, interesting test case!
Could you please share the audio file with me, I would like to test it on my end ?
Yes Faster-whisper
is great and should give you good results and it should be as fast as well, at least when I test it a while ago! But I didn't compare the performance of the two implementations to be honest.
from pywhispercpp.
Related Issues (14)
- How to use coreML models in Mac M2? HOT 3
- "Cannot find source file: ggml.h" when trying to install on Ubuntu 22.04 on aarch64 HOT 3
- Integrating pywhispercpp as the first extension to lollms-webui HOT 2
- Nothing happens HOT 13
- pywhispercpp/whisper.cpp/ggml-opencl.c:4:10: fatal error: 'clblast_c.h' file not found #include <clblast_c.h> HOT 9
- ERROR - Invalid model name `./model.bin` HOT 1
- ERROR - unable to initialize from path HOT 5
- Unable to install on raspberry pi 4 HOT 5
- How to add space between subtitles? HOT 1
- About GPU question HOT 1
- Model class is not supporting relative paths to files HOT 4
- Unknown language error HOT 4
- How to make transcription and speaker diarization using pywhispercpp HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pywhispercpp.