GithubHelp home page GithubHelp logo

Comments (12)

BBC-Esq avatar BBC-Esq commented on June 5, 2024 4

I didn't get a direct response to my main question.

Let's try asking again...Did the dude at insanely-fast-whisper actually create any of the code of these underlying technologies? Wink with your left eye if you don't want to answer because ya'll work together or wink with your right eye for no, he didn't actually create any new innnovation and it was other people...We'll just keep this between you and I. lol.

from whisper-jax.

sanchit-gandhi avatar sanchit-gandhi commented on June 5, 2024 2
  • Transformers Whisper: provides the underlying code for the Whisper model, with efficient attention code and Flash Attention 2 support. Can be used for short form audio and long form audio with "sequential" transcription. See docs for examples.
  • Optimum BetterTransformer: builds on the Transformers implementation, by changing the attention implementation to use PyTorch SDPA. Equivalent to Flash Attention 1 on hardware that supports it.
  • Transformers Pipeline: wrapper around the Transformers Whisper model for an easier API. Also implements the "chunked" long-form transcription algorithm, which is about 10x faster than OpenAI's original "sequential" one, see Section 5 of the Distil-Whisper paper and also Table 7.

=> now this is all the underlying code that you need to get the reported speed-ups

What insanely-fast-whisper does is packages up the above 3 implementations into end-to-end examples and a CLI so that you can get maximum performance as easily as possible.

from whisper-jax.

sanchit-gandhi avatar sanchit-gandhi commented on June 5, 2024 1

Hey @BBC-Esq! Thanks for reaching out, I appreciate your interest in the Whisper-JAX project! Unfortunately this repo is more or less archived now, since we stopped working on it back in April. It was a fun project to see how fast we could make Whisper in JAX on TPU v4-8's, but the community is simply more interested in running on GPUs, which means we've switched to focussing on optimisations that can be applied uniformly, independent of hardware (e.g. Distil-Whisper: https://github.com/huggingface/distil-whisper).

There are some scripts for reproducing the benchmarks here. The pmap benchmarks should be run on at least a TPU v4-8, or v4-16 for the best performance on higher batch-sizes (where we are > 100x faster than Open AI Whisper). These are the results we got on an A100 with CUDA 11 and PyTorch 1.9, and TPU v4-8 https://github.com/sanchit-gandhi/whisper-jax#benchmarks. It would be super interesting to see how it compares to newer implementations, e.g. faster-whisper. Note that Whisper-JAX on TPU v4-8 is much faster than Hugging Face Transformers on GPU (which is what powers insanely-fast-whisper), so you should get the faster result of them all. You can get a ball park idea using a v3-8: https://www.kaggle.com/code/sgandhi99/whisper-jax-tpu. Using a v4-8 is about 2x faster than this in my experience.

from whisper-jax.

BBC-Esq avatar BBC-Esq commented on June 5, 2024

Hey, thank you. BTW, tell your colleague over at insanely-fast-whisper to change his readme and not dump on other peoples' work...

Moving on...Thank you sincerely for the technical discussion. I'm excited that you're working on Distil-Whisper. I still need to test that!

To make sure I understand you...Jax version is much faster on TPU but Hugging Face Transformers is much faster on GPUs? I don't have a TPU so...

from whisper-jax.

BBC-Esq avatar BBC-Esq commented on June 5, 2024

I forgot to ask, is it true that Distil-Whisper can't do any language besides English...and that's basically the tradeoff?

Looking forward to your work on distil-whisper. Definitely in the hopper to try out.

from whisper-jax.

sanchit-gandhi avatar sanchit-gandhi commented on June 5, 2024

We found Whisper JAX to be faster than Hugging Face Transformers' Whisper (same as insanely-fast-whisper) on our experiment. But those with different CUDA/PyTorch/Hardware versions had varying results, and sometimes PyTorch was faster. JAX is a real pain to set-up on CUDA, so I would encourage you to use a PyTorch implementation if you're working on GPU. With Flash Attention 2 support, and full torch compile support coming, I'm pretty sure Whisper in PyTorch will always beat Whisper in JAX in the next few weeks.

If you're using cloud computing, then swapping from GPUs to TPU v3's on GCP are quite reasonably priced, and you can run transcription super fast using there: https://cloud.google.com/tpu/pricing. TPU v4s are what we benchmarked to get the fastest results.

from whisper-jax.

sanchit-gandhi avatar sanchit-gandhi commented on June 5, 2024

Yes that's right. Distil-Whisper is English-only since it's the language that has the most usage, but we still want to provide checkpoints that support more languages. Distilling Whisper on all the languages it supports in one go is hard - the decoder is very small, so it's difficult for it to have good knowledge of all languages at once. Instead, we're actively encouraging the community to distill language-specific Whisper checkpoints. We've released all the Distil-Whisper training code, and a comprehensive description of the steps required to perform distillation: https://github.com/huggingface/distil-whisper/tree/main/training. Feel free to ask if you have any questions about Distil-Whisper on the repo! I'd be more than happy to answer.

from whisper-jax.

BBC-Esq avatar BBC-Esq commented on June 5, 2024

That's awesome, thanks for the info, very helpful. I noticed you said that Hugging Face Transformers' Whisper...is that the same as BetterTransformers in a way? BetterTransformers is basically a class/library that Huggingface created...kind of like Pipeline? I'm learning about Pipeline and how it simplifies things...and I'm learning about the parameters you can use...

My question is, what person (or people) actually, physically, created the batching functionality of the Pipeline...upon which the "insanely" fast whisper (insert lightning bolt, insert the word "blazingly" a few more times...) uses?

It appears that the developer for insanely-faster-whisper singlehandedly created that functionality, thus enabling the world to experience insanely faster whisper for the betterment of mankind.

I'd like to know who's actually responsible and/or if it was a team effort over there with you guys. I'd like to know who to follow to keep abreast of the creative and hard work you guys do...Thanks.

from whisper-jax.

sanchit-gandhi avatar sanchit-gandhi commented on June 5, 2024

You can do all the best open-source work, develop the best open-source models, and have the fastest open-source library. But if no-one knows about, it's useless!

In that regard, making these tools more accessible and visible to the OS community is just as valuable (if not more) than actually developing them.

I don't think we should credit people any more or any less depending on what they've done here. It's a collaborative effort in which we simply want to work with the community to create the best open-source speech technologies possible.

from whisper-jax.

BBC-Esq avatar BBC-Esq commented on June 5, 2024

Thanks for the platitudes, but they still didn't answer my question. No worries, I understand...you're in a tight spot with him being a co-worker. I am not, however, in such a situation.

If he deserved credit I'll recognize that, but seems like he's actually done nothing new or innovative whatsoever so...

Anyways, I've spent a lot of my personal time on this so I'm going to give it a break for a day...Feel free to test out my program if you want, or if you want to re-created the tests I've done. Thanks.

from whisper-jax.

flexchar avatar flexchar commented on June 5, 2024

Based on Vaibhavs10/insanely-fast-whisper#82 I'd suggest to have this closed.

from whisper-jax.

BBC-Esq avatar BBC-Esq commented on June 5, 2024

lol, Noting was actually addressed, but go ahead and close if you want Sanchit.

from whisper-jax.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.