GithubHelp home page GithubHelp logo

Comments (3)

gongouveia avatar gongouveia commented on September 21, 2024

@developer-cade Option 1 is better and simplest.
If I understand correctly, otherwise you state it the encode and decode is run in same GPU you don´t worry about that data overhead.
It is recommended to use two different GPUs (one for encoding and other for decoding) if your GPU memory can not handle the model end to end.
Each audio being 5 seconds and 60X acc. for a GPU such as 4080(24gb for reference), you can expect to transcribe about tens of audios each second per GPU.

from faster-whisper.

nshmyrev avatar nshmyrev commented on September 21, 2024

Whisper is designed to decode long audio files. It processes audio in 30 second chunks. If your input chunks are less than 30 seconds, you'd better use other neural network architecture like Nvidia Conformer. You'll get same accuracy and 10X speedup. If you still want to use Whisper, your best solution would be to combine chunks into 30 seconds.

from faster-whisper.

gongouveia avatar gongouveia commented on September 21, 2024

I believe that you are wrong, the audios get padded to 30 seconds, with VAD activation the padding is deleted.

Accuracy is more related to the speexh length and sentence length than with audio clip length. That is how the decoder works

from faster-whisper.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.