Comments (12)
I didn't get a direct response to my main question.
Let's try asking again...Did the dude at insanely-fast-whisper actually create any of the code of these underlying technologies? Wink with your left eye if you don't want to answer because ya'll work together or wink with your right eye for no, he didn't actually create any new innnovation and it was other people...We'll just keep this between you and I. lol.
from whisper-jax.
- Transformers Whisper: provides the underlying code for the Whisper model, with efficient attention code and Flash Attention 2 support. Can be used for short form audio and long form audio with "sequential" transcription. See docs for examples.
- Optimum BetterTransformer: builds on the Transformers implementation, by changing the attention implementation to use PyTorch SDPA. Equivalent to Flash Attention 1 on hardware that supports it.
- Transformers Pipeline: wrapper around the Transformers Whisper model for an easier API. Also implements the "chunked" long-form transcription algorithm, which is about 10x faster than OpenAI's original "sequential" one, see Section 5 of the Distil-Whisper paper and also Table 7.
=> now this is all the underlying code that you need to get the reported speed-ups
What insanely-fast-whisper
does is packages up the above 3 implementations into end-to-end examples and a CLI so that you can get maximum performance as easily as possible.
from whisper-jax.
Hey @BBC-Esq! Thanks for reaching out, I appreciate your interest in the Whisper-JAX project! Unfortunately this repo is more or less archived now, since we stopped working on it back in April. It was a fun project to see how fast we could make Whisper in JAX on TPU v4-8's, but the community is simply more interested in running on GPUs, which means we've switched to focussing on optimisations that can be applied uniformly, independent of hardware (e.g. Distil-Whisper: https://github.com/huggingface/distil-whisper).
There are some scripts for reproducing the benchmarks here. The pmap
benchmarks should be run on at least a TPU v4-8, or v4-16 for the best performance on higher batch-sizes (where we are > 100x faster than Open AI Whisper). These are the results we got on an A100 with CUDA 11 and PyTorch 1.9, and TPU v4-8 https://github.com/sanchit-gandhi/whisper-jax#benchmarks. It would be super interesting to see how it compares to newer implementations, e.g. faster-whisper. Note that Whisper-JAX on TPU v4-8 is much faster than Hugging Face Transformers on GPU (which is what powers insanely-fast-whisper), so you should get the faster result of them all. You can get a ball park idea using a v3-8: https://www.kaggle.com/code/sgandhi99/whisper-jax-tpu. Using a v4-8 is about 2x faster than this in my experience.
from whisper-jax.
Hey, thank you. BTW, tell your colleague over at insanely-fast-whisper to change his readme and not dump on other peoples' work...
Moving on...Thank you sincerely for the technical discussion. I'm excited that you're working on Distil-Whisper. I still need to test that!
To make sure I understand you...Jax version is much faster on TPU but Hugging Face Transformers is much faster on GPUs? I don't have a TPU so...
from whisper-jax.
I forgot to ask, is it true that Distil-Whisper can't do any language besides English...and that's basically the tradeoff?
Looking forward to your work on distil-whisper. Definitely in the hopper to try out.
from whisper-jax.
We found Whisper JAX to be faster than Hugging Face Transformers' Whisper (same as insanely-fast-whisper
) on our experiment. But those with different CUDA/PyTorch/Hardware versions had varying results, and sometimes PyTorch was faster. JAX is a real pain to set-up on CUDA, so I would encourage you to use a PyTorch implementation if you're working on GPU. With Flash Attention 2 support, and full torch compile support coming, I'm pretty sure Whisper in PyTorch will always beat Whisper in JAX in the next few weeks.
If you're using cloud computing, then swapping from GPUs to TPU v3's on GCP are quite reasonably priced, and you can run transcription super fast using there: https://cloud.google.com/tpu/pricing. TPU v4s are what we benchmarked to get the fastest results.
from whisper-jax.
Yes that's right. Distil-Whisper is English-only since it's the language that has the most usage, but we still want to provide checkpoints that support more languages. Distilling Whisper on all the languages it supports in one go is hard - the decoder is very small, so it's difficult for it to have good knowledge of all languages at once. Instead, we're actively encouraging the community to distill language-specific Whisper checkpoints. We've released all the Distil-Whisper training code, and a comprehensive description of the steps required to perform distillation: https://github.com/huggingface/distil-whisper/tree/main/training. Feel free to ask if you have any questions about Distil-Whisper on the repo! I'd be more than happy to answer.
from whisper-jax.
That's awesome, thanks for the info, very helpful. I noticed you said that Hugging Face Transformers' Whisper...is that the same as BetterTransformers in a way? BetterTransformers is basically a class/library that Huggingface created...kind of like Pipeline? I'm learning about Pipeline and how it simplifies things...and I'm learning about the parameters you can use...
My question is, what person (or people) actually, physically, created the batching functionality of the Pipeline...upon which the "insanely" fast whisper (insert lightning bolt, insert the word "blazingly" a few more times...) uses?
It appears that the developer for insanely-faster-whisper singlehandedly created that functionality, thus enabling the world to experience insanely faster whisper for the betterment of mankind.
I'd like to know who's actually responsible and/or if it was a team effort over there with you guys. I'd like to know who to follow to keep abreast of the creative and hard work you guys do...Thanks.
from whisper-jax.
You can do all the best open-source work, develop the best open-source models, and have the fastest open-source library. But if no-one knows about, it's useless!
In that regard, making these tools more accessible and visible to the OS community is just as valuable (if not more) than actually developing them.
I don't think we should credit people any more or any less depending on what they've done here. It's a collaborative effort in which we simply want to work with the community to create the best open-source speech technologies possible.
from whisper-jax.
Thanks for the platitudes, but they still didn't answer my question. No worries, I understand...you're in a tight spot with him being a co-worker. I am not, however, in such a situation.
If he deserved credit I'll recognize that, but seems like he's actually done nothing new or innovative whatsoever so...
Anyways, I've spent a lot of my personal time on this so I'm going to give it a break for a day...Feel free to test out my program if you want, or if you want to re-created the tests I've done. Thanks.
from whisper-jax.
Based on Vaibhavs10/insanely-fast-whisper#82 I'd suggest to have this closed.
from whisper-jax.
lol, Noting was actually addressed, but go ahead and close if you want Sanchit.
from whisper-jax.
Related Issues (20)
- How to add millisecond for the timestamp?
- I have downloaded the flax_model, where can I call it?
- why whisper-jax did not use my GPU? HOT 3
- Rust impl
- Unsuccessful deployment HOT 1
- Coral TPU support HOT 2
- Slower than openai whisper with my gpu HOT 2
- I want to use whisper-at models HOT 1
- Has translate be integrated into transcribe? It returns English but expect Chinese. HOT 3
- Slow post processing HOT 1
- unable to run TPU using current kaggle environment HOT 1
- Large Model causing performance degradation?
- Shape Error when running on GPU HOT 2
- HuggingFace space erroring more often than usual HOT 1
- Transcription issues.
- Punctuation mark
- Confidence score and average log probability on Whisper-JAX
- whisper-large-v3 (in demo code) VS whisper-large-v2 (in kaggle notebook)
- Add wrapper for wyoming API
- Kernel always restarting when JIT compiling the forward call on MacBook Pro M3 Max
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from whisper-jax.