Comments (3)
@developer-cade Option 1 is better and simplest.
If I understand correctly, otherwise you state it the encode and decode is run in same GPU you don´t worry about that data overhead.
It is recommended to use two different GPUs (one for encoding and other for decoding) if your GPU memory can not handle the model end to end.
Each audio being 5 seconds and 60X acc. for a GPU such as 4080(24gb for reference), you can expect to transcribe about tens of audios each second per GPU.
from faster-whisper.
Whisper is designed to decode long audio files. It processes audio in 30 second chunks. If your input chunks are less than 30 seconds, you'd better use other neural network architecture like Nvidia Conformer. You'll get same accuracy and 10X speedup. If you still want to use Whisper, your best solution would be to combine chunks into 30 seconds.
from faster-whisper.
I believe that you are wrong, the audios get padded to 30 seconds, with VAD activation the padding is deleted.
Accuracy is more related to the speexh length and sentence length than with audio clip length. That is how the decoder works
from faster-whisper.
Related Issues (20)
- support for running through docker HOT 2
- Model not producing accurate transcriptions in Python HOT 12
- Gibberish Outputs HOT 3
- "Thanks for watching" shows up repeatedly HOT 6
- Faster-whisper issue with the latest NVIDIA 55x series drivers HOT 1
- Will it support c++/c just like whisper.cpp? HOT 2
- With `faster-distil-whisper-large-v3` or `large-v3`, `transcribe` instruction is ignored (it translates instead) HOT 3
- ON arm64 'for segment in segments' run a lot of time HOT 2
- Faster whisper loads the wrong tokenizer for whisper-large-v3 derivatives HOT 2
- Having issue in decoding audio chunks properly for fasterWhisper transcribe func
- finetuning encounter multiple errors on the 2nd step (Fine-tuning XTTS Encoder) HOT 1
- clip_timestamps does not work across multiple files [faster-whisper 1.0.2] HOT 4
- What are the ways to improve the speed of continuously recognizing multiple audio files? HOT 1
- Silero-VAD Meta Hallucinations HOT 1
- Limited GPU Utilization with NVIDIA RTX 4000 Ada Gen HOT 13
- The Japanese conversion to the back has always been show thanks for listening ご視聴ありがとうございました what is the reason
- Batch process available? HOT 2
- Word-level timestamps are off after hotwords is setted HOT 1
- Finetuning with Dora HOT 1
- Is there a method or parameter that can filter out noise that is not human voice? HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from faster-whisper.