Comments (13)
Could you do git bisect
between good (v1.4.3) and bad (v1.5.5) to try to locate the main commit responsible for performance drop in your environment?
Less than 10 steps (experiments) should suffice.
from whisper.cpp.
Thank you for the report.
Can you provide what's your current OS and compiler?
Were they the same one for the older commit? EDIT: Sorry, I missed that you confirmed it's the same compiler.
Could you try running make
with AVX512F_M= AVX512VNNI_M= AVX512VBMI_M=
so that AVX512 would not be used?
That could make your new run potentially a bit more comparable to the old one.
(I don't know if slow AVX512 may be the issue here, but it may be worth trying.)
from whisper.cpp.
It is Ubuntu 22.04.4. All running on the same machine in different folders, fresh compiled.
Without AVX512 it is a bit better indeed, but still not the same, somehow in the middle.
total time = 5086.27 ms
whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-base.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51864
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 512
whisper_model_load: n_text_head = 8
whisper_model_load: n_text_layer = 6
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 2 (base)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs = 99
whisper_model_load: CPU total size = 147.37 MB
whisper_model_load: model size = 147.37 MB
whisper_init_state: kv self size = 16.52 MB
whisper_init_state: kv cross size = 18.43 MB
whisper_init_state: compute buffer (conv) = 16.39 MB
whisper_init_state: compute buffer (encode) = 132.07 MB
whisper_init_state: compute buffer (cross) = 4.78 MB
whisper_init_state: compute buffer (decode) = 96.48 MB
system_info: n_threads = 4 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0
whisper_print_timings: load time = 56.47 ms
whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: mel time = 0.00 ms
whisper_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: encode time = 852.86 ms / 1 runs ( 852.86 ms per run)
whisper_print_timings: decode time = 622.78 ms / 256 runs ( 2.43 ms per run)
whisper_print_timings: batchd time = 323.46 ms / 320 runs ( 1.01 ms per run)
whisper_print_timings: prompt time = 3286.05 ms / 4096 runs ( 0.80 ms per run)
whisper_print_timings: total time = 5086.27 ms
from whisper.cpp.
You may want to try also with --beam-size 2
, as that's what seems to be the default in the older commit. It was changed in b6c5f49. As Georgi commented in some other issue:
The quality with more beams in general should be better, but it's possible that you don't observe much of a difference
from whisper.cpp.
It looks like the former default was beam-size=-1 ?
This then switches the strategy WHISPER_SAMPLING_BEAM_SEARCH / WHISPER_SAMPLING_GREEDY
bench doesnt support beam-size, so I am trying a real wav file, and it does improve the speed
(still not like the old version but closer)
default branch (new, with AVX512):
AVX512=1 beam-size 5 default total time = 20678.01 ms
AVX512=1 beam-size 2 total time = 17052.18 ms
AVX512=1 beam-size -1 total time = 15465.98
AVX512=0 beam-size 5 default total time = 19365.01 ms
AVX512=0 beam-size 2 total time = 15219.21 ms
AVX512=0 beam-size -1 total time = 13869.20 ms
Old version:
AVX512=0 beam-size 5 total time = 21862.52 ms
AVX512=0 beam-size 2 total time = 14704.33 ms
AVX512=0 beam-size -1 default total time = 12398.81 ms
Interesting results, especially the AVX issue. We'll play around with it a bit.
Thanks for your help!
(note: beam-search default seems to have changed from -1 to 2 to 5 now : https://github.com/ggerganov/whisper.cpp/blob/master/whisper.cpp#L4625 )
from whisper.cpp.
It looks like the former default was beam-size=-1 ?
I was referring to changes in whisper_full_default_params
, where beam_search.beam_size
changed from 2 to 5, but you're right that whisper_params.beam_size
previously did not use whisper_full_default_params()
, and it was set to -1.
So you may want to try --beam-size 1
too, I guess.
Fox AVX-512 vs Ryzen let me mention:
Zen4's AVX512 Teardown
Ubuntu 22.04 has relatively old compiler. Results from more recent maybe could be different.
I'm wondering if WHISPER_NO_AVX512
shouldn't be introduced in Makefile, to make it easier to disable AVX-512 (setting 3 variables is relatively cumbersome). Maybe we should even set such WHISPER_NO_AVX512
to 1 by default, but we would need to have bigger sample to be able to decide if more folks are harmed performance-wise by having AVX-512 enabled than by having it disabled. Autodetection that is done in Makefile assumes that adding more ISA extensions allow compiler to do better job (produce more efficient code), but that may not always be the case, as we can see in this issue.
from whisper.cpp.
We are experiencing a similar behavior when comparing version 1.4.3 with the latest 1.5.5.
But since we are using CMake for the build I guess it cannot be related to AVX512, because WHISPER_NO_AVX512
is set by default, right?
Also we are not using beam search (by setting whisper_full_default_params(WHISPER_SAMPLING_GREEDY)
) so this should also not affect the performance, right?
Seems the greedy.best_of
also changed from 2 to 5. But when I change it back the performance does not change much. So I guess this is also unrelated..
from whisper.cpp.
Ok, so my git bisect
gives me the following result:
3e5c7feeffb86555d63ef592f79ce8365a069174 is the first bad commit
commit 3e5c7feeffb86555d63ef592f79ce8365a069174
Author: Evan Jones <[email protected]>
Date: Mon Nov 13 03:51:34 2023 -0500
whisper : add grammar-based sampling (#1229)
* whisper : add grammar-based sampling
* build : fix after master merge
* command : fix exception when recognizing the command
* whisper : fine-tuning grammar functionality
* command : grammar-related improvements
- option to read grammar from file
- add sample grammars for colors and chess moves
- fine-tune the performance further
* grammars : add assistant + update comments
* command : enable beam-search, add "no_timestamps", add "context", add p
* whisper : remove comment
---------
Co-authored-by: Georgi Gerganov <[email protected]>
Any idea how this commit could influence the performance in such a bad way?
BTW: The performance drop on our side is about ~40%
from whisper.cpp.
I dropped this comment to test if it will fix the performance problem but it didn't.
So I did another git bisect
and the next bad commit is "whisper : add batched decoding (#1486)" (b6c5f49), but I don't think I can easily drop this one..
Is there anything else I can try out based on these informations?
from whisper.cpp.
Related Issues (20)
- Undefined symbols for architecture arm64: "_MTLCopyAllDevices" HOT 1
- Problem creating node addon HOT 2
- [Feature Request] Any plans for translation using OpenAI instead of DeepL? HOT 1
- Feature request - Support WhisperSpeech for voice generation with whisper model HOT 2
- Problem compiling addon.node (+solution) HOT 7
- ci: windows-msys2 CLANG64 builds are failing HOT 1
- ci: emscripten builds are failing with Emscripten SDK 3.1.58 HOT 1
- Stream: noise ouput
- Spam Attack HOT 2
- Ubuntu 22.04 - tested commit 8fac645 - microphone is not passing audio to talk-llama , older builds ( from a month passing microphone audio ) HOT 2
- MSVC static runtime library
- The path to metal files is not validated when whisper.cpp is used as a subproject
- Disabling WHISPER_LOG_INFO HOT 2
- Unable to generate the large-v3 CoreML model
- chinese characters not showing up on windows HOT 2
- When transcribing Chinese audio, using whisper_full_get_segment_text can return the correct text, but using whisper_full_get_token_text might result in NULL.
- main.exe does nothing on transcribe task (crash probably) HOT 10
- Read write protected Memory Exception When Try to use Timestamp DTW. HOT 1
- Windows Large V3 Malloc 4GB Limitation HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from whisper.cpp.