I compared an older version from Nov 23 with Apr 24, and the older version is much fas

It looks like the former default was beam-size=-1 ? <p

CPU Performance Regression? (Older version much faster) about whisper.cpp HOT 13 OPEN

nanocosmos-ol commented on May 29, 2024

CPU Performance Regression? (Older version much faster)

from whisper.cpp.

Comments (13)

przemoc commented on May 29, 2024 1

Could you do git bisect between good (v1.4.3) and bad (v1.5.5) to try to locate the main commit responsible for performance drop in your environment?
Less than 10 steps (experiments) should suffice.

from whisper.cpp.

przemoc commented on May 29, 2024

Thank you for the report.

Can you provide what's your current OS and compiler?
Were they the same one for the older commit? _{EDIT: Sorry, I missed that you confirmed it's the same compiler.}

Could you try running make with AVX512F_M= AVX512VNNI_M= AVX512VBMI_M= so that AVX512 would not be used?

That could make your new run potentially a bit more comparable to the old one.
(I don't know if slow AVX512 may be the issue here, but it may be worth trying.)

from whisper.cpp.

nanocosmos-ol commented on May 29, 2024

It is Ubuntu 22.04.4. All running on the same machine in different folders, fresh compiled.

Without AVX512 it is a bit better indeed, but still not the same, somehow in the middle.

total time = 5086.27 ms

whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-base.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51864
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 512
whisper_model_load: n_text_head = 8
whisper_model_load: n_text_layer = 6
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 2 (base)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs = 99
whisper_model_load: CPU total size = 147.37 MB
whisper_model_load: model size = 147.37 MB
whisper_init_state: kv self size = 16.52 MB
whisper_init_state: kv cross size = 18.43 MB
whisper_init_state: compute buffer (conv) = 16.39 MB
whisper_init_state: compute buffer (encode) = 132.07 MB
whisper_init_state: compute buffer (cross) = 4.78 MB
whisper_init_state: compute buffer (decode) = 96.48 MB

whisper_print_timings: load time = 56.47 ms
whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: mel time = 0.00 ms
whisper_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: encode time = 852.86 ms / 1 runs ( 852.86 ms per run)
whisper_print_timings: decode time = 622.78 ms / 256 runs ( 2.43 ms per run)
whisper_print_timings: batchd time = 323.46 ms / 320 runs ( 1.01 ms per run)
whisper_print_timings: prompt time = 3286.05 ms / 4096 runs ( 0.80 ms per run)
whisper_print_timings: total time = 5086.27 ms

from whisper.cpp.

przemoc commented on May 29, 2024

You may want to try also with --beam-size 2, as that's what seems to be the default in the older commit. It was changed in b6c5f49. As Georgi commented in some other issue:

The quality with more beams in general should be better, but it's possible that you don't observe much of a difference

from whisper.cpp.

nanocosmos-ol commented on May 29, 2024

It looks like the former default was beam-size=-1 ?
This then switches the strategy WHISPER_SAMPLING_BEAM_SEARCH / WHISPER_SAMPLING_GREEDY

bench doesnt support beam-size, so I am trying a real wav file, and it does improve the speed
(still not like the old version but closer)

default branch (new, with AVX512):

AVX512=1 beam-size 5 default total time = 20678.01 ms

AVX512=1 beam-size 2 total time = 17052.18 ms

AVX512=1 beam-size -1 total time = 15465.98

AVX512=0 beam-size 5 default total time = 19365.01 ms

AVX512=0 beam-size 2 total time = 15219.21 ms

AVX512=0 beam-size -1 total time = 13869.20 ms

Old version:

AVX512=0 beam-size 5 total time = 21862.52 ms

AVX512=0 beam-size 2 total time = 14704.33 ms

AVX512=0 beam-size -1 default total time = 12398.81 ms

Interesting results, especially the AVX issue. We'll play around with it a bit.

Thanks for your help!

(note: beam-search default seems to have changed from -1 to 2 to 5 now : https://github.com/ggerganov/whisper.cpp/blob/master/whisper.cpp#L4625 )

from whisper.cpp.

przemoc commented on May 29, 2024

It looks like the former default was beam-size=-1 ?

I was referring to changes in whisper_full_default_params, where beam_search.beam_size changed from 2 to 5, but you're right that whisper_params.beam_size previously did not use whisper_full_default_params(), and it was set to -1.

So you may want to try --beam-size 1 too, I guess.

Fox AVX-512 vs Ryzen let me mention:
Zen4's AVX512 Teardown

Ubuntu 22.04 has relatively old compiler. Results from more recent maybe could be different.

I'm wondering if WHISPER_NO_AVX512 shouldn't be introduced in Makefile, to make it easier to disable AVX-512 (setting 3 variables is relatively cumbersome). Maybe we should even set such WHISPER_NO_AVX512 to 1 by default, but we would need to have bigger sample to be able to decide if more folks are harmed performance-wise by having AVX-512 enabled than by having it disabled. Autodetection that is done in Makefile assumes that adding more ISA extensions allow compiler to do better job (produce more efficient code), but that may not always be the case, as we can see in this issue.

from whisper.cpp.

Linux13524 commented on May 29, 2024

We are experiencing a similar behavior when comparing version 1.4.3 with the latest 1.5.5.
But since we are using CMake for the build I guess it cannot be related to AVX512, because WHISPER_NO_AVX512 is set by default, right?

Also we are not using beam search (by setting whisper_full_default_params(WHISPER_SAMPLING_GREEDY)) so this should also not affect the performance, right?

Seems the greedy.best_of also changed from 2 to 5. But when I change it back the performance does not change much. So I guess this is also unrelated..

from whisper.cpp.

Linux13524 commented on May 29, 2024

Ok, so my git bisect gives me the following result:

3e5c7feeffb86555d63ef592f79ce8365a069174 is the first bad commit
commit 3e5c7feeffb86555d63ef592f79ce8365a069174
Author: Evan Jones <[email protected]>
Date:   Mon Nov 13 03:51:34 2023 -0500

    whisper : add grammar-based sampling (#1229)
    
    * whisper : add grammar-based sampling
    
    * build : fix after master merge
    
    * command : fix exception when recognizing the command
    
    * whisper : fine-tuning grammar functionality
    
    * command : grammar-related improvements
    
    - option to read grammar from file
    - add sample grammars for colors and chess moves
    - fine-tune the performance further
    
    * grammars : add assistant + update comments
    
    * command : enable beam-search, add "no_timestamps", add "context", add p
    
    * whisper : remove comment
    
    ---------
    
    Co-authored-by: Georgi Gerganov <[email protected]>

Any idea how this commit could influence the performance in such a bad way?

BTW: The performance drop on our side is about ~40%

from whisper.cpp.

Linux13524 commented on May 29, 2024

I dropped this comment to test if it will fix the performance problem but it didn't.
So I did another git bisect and the next bad commit is "whisper : add batched decoding (#1486)" (b6c5f49), but I don't think I can easily drop this one..

Is there anything else I can try out based on these informations?

from whisper.cpp.

CPU Performance Regression? (Older version much faster) about whisper.cpp HOT 13 OPEN

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs