Hi all, I'm dealing with a scenario where I receive simultaneous requests for processi

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Best strategy for low-latency, high-throuhgput serving in Multi-GPU setups about faster-whisper HOT 3 OPEN

developer-cade commented on September 21, 2024

Best strategy for low-latency, high-throuhgput serving in Multi-GPU setups

from faster-whisper.

Comments (3)

gongouveia commented on September 21, 2024

@developer-cade Option 1 is better and simplest.
If I understand correctly, otherwise you state it the encode and decode is run in same GPU you don´t worry about that data overhead.
It is recommended to use two different GPUs (one for encoding and other for decoding) if your GPU memory can not handle the model end to end.
Each audio being 5 seconds and 60X acc. for a GPU such as 4080(24gb for reference), you can expect to transcribe about tens of audios each second per GPU.

from faster-whisper.

nshmyrev commented on September 21, 2024

Whisper is designed to decode long audio files. It processes audio in 30 second chunks. If your input chunks are less than 30 seconds, you'd better use other neural network architecture like Nvidia Conformer. You'll get same accuracy and 10X speedup. If you still want to use Whisper, your best solution would be to combine chunks into 30 seconds.

from faster-whisper.

gongouveia commented on September 21, 2024

I believe that you are wrong, the audios get padded to 30 seconds, with VAD activation the padding is deleted.

Accuracy is more related to the speexh length and sentence length than with audio clip length. That is how the decoder works

from faster-whisper.

Best strategy for low-latency, high-throuhgput serving in Multi-GPU setups about faster-whisper HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs