ggerganov / whisper.cpp Goto Github PK
View Code? Open in Web Editor NEWPort of OpenAI's Whisper model in C/C++
License: MIT License
Port of OpenAI's Whisper model in C/C++
License: MIT License
Thanks for sharing whisper.cpp @ggerganov. Wondering if I'm missing something. I tried whisper.cpp on a 40-minute wav file, which took almost 2 hours to transcribe, which doesn't seem to be what others have experienced. I tried transcribing on an 8-vcpu machine, 32 gb of memory. Any settings I'm missing? Appreciate your help.
Unfortunately I'm unable to share the wav file as it's private data.
`
whisper_model_load: loading model from 'models/ggml-large.bin'
whisper_model_load: n_vocab = 51865
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 1280
whisper_model_load: n_text_head = 20
whisper_model_load: n_text_layer = 32
whisper_model_load: n_mels = 80
whisper_model_load: f16 = 1
whisper_model_load: type = 5
whisper_model_load: mem_required = 4576.00 MB
whisper_model_load: adding 1608 extra tokens
whisper_model_load: ggml ctx size = 3255.34 MB
whisper_model_load: memory size = 304.38 MB
whisper_model_load: model size = 2950.66 MB
main: processing 'output/x.wav' (38688821 samples, 2418.1 sec), 4 threads, lang = en, task = transcribe, timestamps = 1 ...
whisper_print_timings: load time = 4246.85 ms
whisper_print_timings: mel time = 31377.23 ms
whisper_print_timings: sample time = 3421.71 ms
whisper_print_timings: encode time = 4697475.00 ms / 146796.09 ms per layer
whisper_print_timings: decode time = 1830579.38 ms / 57205.61 ms per layer
whisper_print_timings: total time = 6568016.00 ms
`
Fully stumped. Only did make
. cpp (Ubuntu 11.2.0-19ubuntu1) 11.2.0
whisper.cpp: In function ‘whisper_full_params whisper_full_default_params(whisper_decode_strategy)’: whisper.cpp:2305:17: internal compiler error: in reshape_init_class, at cp/decl.c:6465 2305 | }; | ^ 0x7f415aaa6d8f __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 0x7f415aaa6e3f __libc_start_main_impl ../csu/libc-start.c:392 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <file:///usr/share/doc/gcc-11/README.Bugs> for instructions.
Hi there! I'm attempting to build whisper.cpp for MUSL Linux for some lightweight systems, and I figured I would note the issues I ran into during the build.
stdint.h
, or alloca.h
in its standard library when you install gcc. This results in a slew of errors:localhost:~/whisper.cpp# make libwhisper.a
cc -O3 -std=c11 -Wall -Wextra -Wno-unused-parameter -Wno-unused-function -pthread -c ggml.c
In file included from ggml.h:7,
from ggml.c:1:
/usr/lib/gcc/aarch64-alpine-linux-musl/12.2.1/include/stdint.h:9:26: error: no include path in which to search for stdint.h
9 | # include_next <stdint.h>
| ^
ggml.h:107:5: error: unknown type name 'int64_t'
107 | int64_t perf_cycles;
| ^~~~~~~
~~snip~~
ggml.c:6:10: fatal error: alloca.h: No such file or directory
6 | #include <alloca.h>
| ^~~~~~~~~~
compilation terminated.
make: *** [Makefile:58: ggml.o] Error 1
localhost:~/whisper.cpp#
This fix is relatively simple, just install g++:
apk add g++
clock_gettime
and CLOCK_MONOTONIC
are seemingly undefined regardless of compiler used.localhost:~/whisper.cpp# make libwhisper.a
cc -O3 -std=c11 -Wall -Wextra -Wno-unused-parameter -Wno-unused-function -pthread -c ggml.c
ggml.c: In function 'ggml_time_ms':
ggml.c:155:5: warning: implicit declaration of function 'clock_gettime' [-Wimplicit-function-declaration]
155 | clock_gettime(CLOCK_MONOTONIC, &ts);
| ^~~~~~~~~~~~~
ggml.c:155:19: error: 'CLOCK_MONOTONIC' undeclared (first use in this function)
155 | clock_gettime(CLOCK_MONOTONIC, &ts);
| ^~~~~~~~~~~~~~~
ggml.c:155:19: note: each undeclared identifier is reported only once for each function it appears in
ggml.c: In function 'ggml_time_us':
ggml.c:161:19: error: 'CLOCK_MONOTONIC' undeclared (first use in this function)
161 | clock_gettime(CLOCK_MONOTONIC, &ts);
| ^~~~~~~~~~~~~~~
make: *** [Makefile:58: ggml.o] Error 1
localhost:~/whisper.cpp#
Digging around the internet shows a fix for this as inserting #define _POSIX_C_SOURCE 199309L
before including the time.h header. This appears to work successfully, placing it on line 10 of ggml.c
. It would be nice if this issue could be fixed in some way. I would make a PR if I had sufficient knowledge to implement the required changes, which I don't.
Noting that the processing time is considerably shorter than the length of speech, is it possible to feed the models real time microphone output? Or does the inference run on the complete audio stream, instead of sample by sample?
This would greatly reduce the latency for voice assistants and the like, that the audio does not need to be fully captured and only after that fed to the models. Basically the same as I did here with SODA: https://github.com/biemster/gasr, but then with an open source and multilang model.
Would be nice if someone can help and provide build instructions for Windows.
I think the only thing that might need an update is the pthread dependency in ggml.c.
The rest of the code should build successfully.
Probably a .bat
script to download the models would also be nice since no Bash on Windows.
Seemed to come from here:
0x00005555555586fb in _mm256_fmadd_ps (__C=..., __B=..., __A=...) at /usr/lib/gcc/x86_64-linux-gnu/7/include/fmaintrin.h:65
65 return (__m256)__builtin_ia32_vfmaddps256 ((__v8sf)__A, (__v8sf)__B,
with backtrace
(gdb) bt
#0 0x00005555555586fb in _mm256_fmadd_ps (__C=..., __B=..., __A=...) at /usr/lib/gcc/x86_64-linux-gnu/7/include/fmaintrin.h:65
#1 ggml_vec_dot_f16 (n=96, s=0x7ffffffe4e54, x=0x7fff646b6ee0, y=0x7fff64746ee0) at ggml.c:375
#2 0x0000555555564766 in ggml_compute_forward_conv_1d_1s_f16_f32 (params=0x7ffffffe51c0, src0=0x7fff9025f0f0, src1=0x7fff65482030, dst=0x7fff6556c6f0) at ggml.c:4668
#3 0x0000555555564f40 in ggml_compute_forward_conv_1d_1s (params=0x7ffffffe51c0, src0=0x7fff9025f0f0, src1=0x7fff65482030, dst=0x7fff6556c6f0) at ggml.c:4806
#4 0x0000555555568707 in ggml_compute_forward (params=0x7ffffffe51c0, tensor=0x7fff6556c6f0) at ggml.c:5809
#5 0x000055555556a6ec in ggml_graph_compute (ctx=0x5555557f3b48 <g_state+104>, cgraph=0x7ffffffe5340) at ggml.c:6611
#6 0x0000555555580cb2 in whisper_encode (model=..., n_threads=4, mel_offset=0, mel_inp=..., features=std::vector of length 0, capacity 0) at main.cpp:1353
#7 0x0000555555584664 in main (argc=5, argv=0x7fffffffdb78) at main.cpp:2225
On Ubuntu 18.04, gcc 7.5.0, on an Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz
Examples work fine for me, but I get an error when trying different wavefiles larger than 21s:
./main -m models/ggml-base.en.bin mycut-22s-16khz.wav
whisper_model_load: loading model from 'models/ggml-base.en.bin'
[...]
main: processing 'mycut-22s-16khz.wav' (352000 samples, 22.0 sec), 2 threads, lang = en, task = transcribe, timestamps = 1 ...
[...]
main: ggml.c:6658: ggml_graph_compute: Assertion `false' failed.
Aborted (core dumped)
It's apparently fixed if I comment the assert and substitute by cgraph->work = NULL;
in ggml.c. But I guess it's not the best workaround, as it crashes again with sigfault if audio duration is more than 43s aproximately.
Running on Ubuntu 18.04 LTS (GNU/Linux 4.15.0-187-generic x86_64), after fixing CACHE_LINE_SIZE and initializers issues #11
Any hint? Thanks!
Good day everyone!
I'm thinking about bindings for Python.
So far, I'm interested in 4 functionalities:
Perhaps in the near future, I will try to take up this task. But I had no experience with python bindings. So, if there are craftsmen who can do it quickly (if it can be done quickly... 😃), that would be cool!
Hi,
I could compile this on FreeBSD 13.1-RELEASE-p2 amd64, having devel/gmake
installed (using gmake
then instead of make
) and using the following modifications:
--- Makefile_ori 2022-10-16 21:19:22.498824000 +0200
+++ Makefile 2022-10-16 22:40:53.787014000 +0200
@@ -22,10 +22,17 @@
CFLAGS += -pthread
CXXFLAGS += -pthread
endif
+ifeq ($(UNAME_S),FreeBSD)
+ CFLAGS += -pthread
+ CXXFLAGS += -pthread
+endif
# Architecture specific
# TODO: probably these flags need to be tweaked on some architectures
ifeq ($(UNAME_M),x86_64)
+ CFLAGS += -mavx -mavx2 -mfma -mf16c
+endif
+ifeq ($(UNAME_M),amd64)
CFLAGS += -mavx -mavx2 -mfma -mf16c
endif
ifneq ($(filter arm%,$(UNAME_M)),)
(don't know gmake-Makefiles too much, could be prettier with logical or here ...)
--- ggml.c_ori 2022-10-16 21:19:22.502786000 +0200
+++ ggml.c 2022-10-16 21:28:00.140594000 +0200
@@ -2,7 +2,7 @@
#if defined(_MSC_VER) || defined(__MINGW32__)
#include <malloc.h> // using malloc.h with MSC/MINGW
-#else
+#elif !defined(__FreeBSD__)
#include <alloca.h>
#endif
Seems not so hard to merge changes into upstream ...
For downloading models ftp/wget
is needed.
Kind regards,
abelbabel
Hi @ggerganov
whisper.cpp look promising, thank you for you work.
I know there is timestamp limitation in the README currently.
Is it possible to include timestamp in the future? that will be useful when generate subtitle.
Or can whisper.cpp support stream mode with steaming audio.
Hi,
I'm not so much into the details of whisper or whisper.cpp and I don't know if it is currently even possible with the foundation, but it would be nice if speakers could be marked or speaker-changes / voice-changes.
This would be very handy when processing interviews, radio/tv shows, films, etc.
Kind regards,
abelbabel
I'm attempting to automate rust-bindgen generation. This appears to not work, however, as it uses clang which does not implicitly #include <stdbool.h>
. Adding #include <stdbool.h>
to line 5 of whisper.h appears to fix this. I'm opening this issue to get feedback and others' thoughts.
Make errors out on a aarch64 server
make base.en
#gcc -pthread -O3 -c ggml.c
gcc -pthread -O3 -mcpu=cortex-a72 -mfloat-abi=hard -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access -c ggml.c
gcc: error: unrecognized command-line option ‘-mfloat-abi=hard’
gcc: error: unrecognized command-line option ‘-mfpu=neon-fp-armv8’
gcc: error: unrecognized command-line option ‘-mfp16-format=ieee’
gcc: error: unrecognized command-line option ‘-mno-unaligned-access’
make: *** [Makefile:7: ggml.o] Error 1
Perhaps this is enough as C flags? : -Ofast -g -mfpu=neon
./main -m models/ggml-medium.bin -l zh -f ~/Movies/samplecn16k.wav
output
[00:00.000 --> 00:16.000] 元����,其实就是����世界,而且要用����世界这个��来定��元����的话,要比元����本身更加����。到这里就出现问题了。那它为什么不叫����世界呢?最��单的原因就是,����世界这个说法大家已经听��了,而元������得更为新��,又包��成为了一个新的概念。
[00:16.000 --> 00:44.000] 现在的元����技��,����没有我们想象中那么先进。按照目前世界第一元����公司,Roblox公司对于元����的定��来看,它起��要具��8个要素,分别是身份、社交、成进、����、多元、��地、经��、文明。身份就是一个����身份,��现实中的角色无关,这个比��好理解。社交也就是社交系��。成进就是感知����的升��,要做到和现实世界的体��完全相同。����就��������,不会有卡��,多元就多元化,
with OpenAI whisper
cli
whisper --language zh ~/Movies/samplecn16k.wav
[00:00.000 --> 00:01.760] 元宇宙其实就虚拟世界
[00:01.760 --> 00:04.400] 而且要用虚拟世界这个词来定义元宇宙的话
[00:04.400 --> 00:06.400] 要比元宇宙本身更加准确
[00:06.400 --> 00:07.680] 但这里就出现问题了
[00:07.680 --> 00:09.360] 那它为什么不叫虚拟世界呢?
[00:09.360 --> 00:10.720] 最简单的原因就是
[00:10.720 --> 00:12.880] 虚拟世界这个说法大家已经听腻了
[00:12.880 --> 00:14.320] 而元宇宙显得更为吸引
[00:14.320 --> 00:16.200] 又包装成为了一个新的概念
[00:16.200 --> 00:17.440] 现在的元宇宙技术
[00:17.440 --> 00:19.160] 原有没有我们想象中那么先进
[00:19.160 --> 00:21.320] 按照目前世界第一元宇宙公司
[00:21.320 --> 00:23.480] 罗布洛克斯公司对于元宇宙的定义来看
[00:23.480 --> 00:25.080] 它起码要具备8个要素
[00:25.080 --> 00:30.680] 分别是身份、社交、成敬、延迟、多元、随地、经济、文明
[00:30.680 --> 00:32.280] 身份就是一个虚拟身份
[00:32.280 --> 00:33.640] 与现实中的角色无关
[00:33.640 --> 00:34.640] 这个比较好理解
[00:34.640 --> 00:36.200] 社交也就是社交系统
[00:36.200 --> 00:38.320] 成敬就是感知设备的升级
[00:38.320 --> 00:40.800] 要做到和现实世界的体验完全相同
[00:40.800 --> 00:42.080] 延迟就网络延迟
[00:42.080 --> 00:43.080] 不会有卡顿
[00:43.080 --> 00:44.200] 多元就多元化
[00:44.200 --> 00:45.600] 比如可以在里面玩游戏
Performance report.
Meaning V2 and V3: V2 its before this commit
model | T, s | -t, CPU |
---|---|---|
tiny | 64 | 1 |
tiny | 21 | 4 |
tiny | 21 | 8 |
tiny | 80 | 16 |
tiny | 175 | 24 |
base | 42 | 8 |
base | 93 | 16 |
small | 110 | 8 |
small | 190 | 16 |
large | 420 | 8 |
large | 537 | 16 |
model | T, s | -t, CPU |
---|---|---|
tiny | 84 | 1 |
tiny | 32 | 4 |
tiny | 28 | 8 |
tiny | 56 | 16 |
tiny | 86 | 24 |
base | 58 | 8 |
base | 125 | 16 |
small | 104 | 8 |
small | 177 | 16 |
large | 570 | 8 |
large | 850 | 16 |
model | T, s | -t, CPU |
---|---|---|
tiny | 17 | 1 |
tiny | 9 | 2 |
tiny | 5 | 4 |
base | 56 | 1 |
base | 25 | 2 |
base | 16 | 4 |
small | 155 | 1 |
small | 86 | 2 |
small | 53 | 4 |
large | 788 | 1 |
large | 428 | 2 |
large | 260 | 4 |
whisper_model_load: type = 1
whisper_model_load: mem_required = 452.00 MB
main: load time = 84.28 ms
main: mel time = 118.88 ms
main: sample time = 46.91 ms
main: encode time = 531.27 ms / 132.82 ms per layer
main: decode time = 3730.47 ms
main: total time = 6181.17 ms
main: load time = 80.49 ms
main: mel time = 97.64 ms
main: sample time = 13.85 ms
main: encode time = 533.10 ms / 133.27 ms per layer
main: decode time = 1036.91 ms
main: total time = 2348.79 ms
whisper_model_load: type = 1
whisper_model_load: mem_required = 244.00 MB
main: load time = 241.68 ms
main: mel time = 656.11 ms
main: sample time = 1202.84 ms
main: encode time = 1736.55 ms / 434.14 ms per layer
main: decode time = 8354.48 ms
main: total time = 12211.61 ms
main: load time = 243.57 ms
main: mel time = 541.42 ms
main: sample time = 209.42 ms
main: encode time = 2901.70 ms / 725.42 ms per layer
main: decode time = 1588.76 ms
main: total time = 5501.20 ms
g++ -O3 -std=c++11 -Wall -Wextra -Wno-unused-parameter -Wno-unused-function -pthread -c whisper.cpp whisper.cpp: In function ‘whisper_full_params whisper_full_default_params(whisper_decode_strategy)’: whisper.cpp:2286:17: sorry, unimplemented: non-trivial designated initializers not supported }; ^ whisper.cpp:2313:17: sorry, unimplemented: non-trivial designated initializers not supported }; ^ Makefile:74: recipe for target 'whisper.o' failed make: *** [whisper.o] Error 1
g++ (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Refer our discussion at #8 , I can run ggml-large.bin
, for same input audio 120 sec ( 2 minutes) in around 54 minutes on Samsung A52.
What is your suggestion for optimization to run bigger model on cheaper hardware:
Will be happy if you share the resource I can learn to achieve that goal.
I was running a task on a german language youtube video with the command line
./main -m ggml-base.bin bauer.wav -t 8 -l de -osrt
and the process ran ok until around the 4-minute mark, then I've got the error:
"whisper_full: failed to generate timestamp token - this should not happen"
repeated several times, and the transcription never resumed.
I changed the command line to use 4 cores, didn´t include the srt file generation and still the same error.
Curiously, if I force english transcription with "-l en", the transcription is ok until 4 minutes or so and then the same sentence repeats until the end of the file.
I think this happened after the commit to reduce the sentence length.
I can't see any docs as to threadsafety for the C API. Information here would be very helpful for me and future users. Thanks!
I want to experiment with using whisper into the app, but when I open it, an error occurs when the compiled library requires libc++_shared.so
,
i use this bash to build for android target
/home/azkdev/Android/Sdk/ndk/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android21-clang -pthread -O3 -std=c11 -mavx -mavx2 -mfma -mf16c -c ./ggml.c -fPIC -lstdc++
/home/azkadev/Android/Sdk/ndk/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android21-clang++ -pthread -O3 -std=c++11 -mavx -mavx2 -mfma -mf16c -c ./whisper.cpp -fPIC -lstdc++
/home/azkadev/Android/Sdk/ndk/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android21-clang++ -pthread -O3 -std=c++11 ./main.cpp -fPIC -lstdc++ whisper.o ggml.o -o ./whisper.so --shared -fPIC -lstdc++
I have also tried this clang-linking-so-library-libc-shared-so, but it doesn't work
can you give a build command so it doesn't need libc++_shared.so? sorry i'm still a beginner in cpp
Hi and first of all thanks for creating this implementation.
I am trying to create a new version out of this one https://huggingface.co/spaces/Finnish-NLP/Whisper-ASR-youtube-subtitles
I have this now working locally but now I spotted that generated subtitles are way too long.
On the right this cpp implementation. Original pytorch implementation (small model) on the left.
Is it a "feature" of this implementation or what is going on here?
Implement a very basic Java application using whisper.cpp
. It can be used as an example for running Whisper on Android.
The ggwave-java project can be used as a good starting point. It already provides the audio capture functionality. Instead of passing it to ggwave
, we just need to pass it to whisper.cpp
.
Edit:
Looking for volunteers to help with this - ideally, we would like to have the same functionality demonstrated as in the iOS example application.
I get this on Ubuntu 18.04 gcc 7.5.0 (time to update, yes), and I don't immediately see how to fix it since I don't know __cpp_lib_hardware_interference_size
. Otherwise a simple replacement with a #define
would suffice.
gcc -pthread -O3 -mavx -mavx2 -mfma -mf16c -c ggml.c
ggml.c:183:36: error: initializer element is not constant
const size_t CACHE_LINE_SIZE_F32 = CACHE_LINE_SIZE/sizeof(float);
We can easily build whisper.cpp
as a WASM library using Emscripten:
mkdir build-em
cd build-em
emcmake cmake ..
make
It looks like a big subset of SIMD intrinsics are already supported, so the performance might not be really bad:
https://emscripten.org/docs/porting/simd.html
So let's try running whisper.cpp
directly in the browser!
I tried running "make" and got this error:
process_begin: CreateProcess(NULL, uname -s, ...) failed.
process_begin: CreateProcess(NULL, uname -p, ...) failed.
process_begin: CreateProcess(NULL, uname -m, ...) failed.
cc -O3 -std=c11 -Wall -Wextra -Wno-unused-parameter -Wno-unused-function -c ggml.c
process_begin: CreateProcess(NULL, cc -O3 -std=c11 -Wall -Wextra -Wno-unused-parameter -Wno-unused-function -c ggml.c, ...) failed.
make (e=2): The system cannot find the file specified.
make: *** [ggml.o] Error 2
Could someone guide me through building this program on Windows? Are there pre-built binaries available? I have Visual Studio 2022 and MinGW installed.
Hi! Firstly, thank you so much for this incredible work!
I have been running the tiny.en models on a large number of wav files stored in a folder. I am currently parallelizing the work over a multi-core machine using GNU parallel and running the following command :
find input_data/eng_wav_data -name "*.wav" | parallel 'time ./main -m models/ggml-tiny.en.bin -nt -f {} -t 1 > {.}.txt'
I found that currently the model is loaded each time we have to transcribe a wav file. Is there a way I can circumvent this and load the model only once? Any help would be appreciated. Thank you. Apologies if this issue has been resolved already
I'm glad you shared this implementation.
A steep increase in performance relative to the torch on the CPU.
It is possible that you already know, but found how to enable recognition of a certain language.
We just can put in line 2012 main.cpp this:
std::vector<whisper_vocab::id> prompt = { vocab.token_sot, vocab.token_lang, vocab.token_task };
This 3 tokens formed here:
https://github.com/openai/whisper/blob/8cf36f3508c9acd341a45eb2364239a3d81458b9/whisper/tokenizer.py#L324-L331
For specific use in main.cpp, you can simply specify the desired index manually. But for regular users, it would be cool to specify which language they would prefer to see in the output.
Hi,
it would be great to have a simple app that takes data from pipe and runs recognition on it ... similar to stream.cpp, but instead taking data from audio device, taking it from pipe ...
Could also be an addition to the main-example, so that you can use it like this:
cat samples/jfk.wav | ./main -m models/ggml-medium.bin -f -
Here something similar is done with vosk and python. (ffmpeg-pre-processing could be something people can do on their own before filling the pipe and not part of the app ...)
Kind regards,
abelbabel
Currently, I am hosting the ggml
Whisper model files on my Linode server.
However, it has a limited network bandwidth per month and as more people start using whisper.cpp
it won't be enough.
What are some good options for hosting ~10GB of data?
The only requirement is to be able to wget
/curl
the files directly - i.e. Google Drive and alike are not an option.
Implement a very basic iOS application using whisper.cpp
The ggwave-objc project can be used as a good starting point. It already provides the audio capture functionality. We just need to pass the captured data to whisper.cpp
.
is it possible to run this gghml model on raspberry pi hardware?
Does anyone have any ideas of how to use this code but with CUDA libs? I want to move away from the Python version but keep PyTorch CUDA.
is it possible to run ggml C/C++ whisper model on tflite micro framework?
I have this issue when trying to compile the most recent version (as of 16 oct 2022):
(base) user@pc:~/whisper.cpp$ make
g++ -O3 -std=c++11 -Wall -Wextra -Wno-unused-parameter -Wno-unused-function -pthread -c whisper.cpp
whisper.cpp: In function ‘whisper_full_params whisper_full_default_params(whisper_decode_strategy)’:
whisper.cpp:2305:17: internal compiler error: in reshape_init_class, at cp/decl.c:6465
2305 | };
| ^
0x7fdf6ca75d8f __libc_start_call_main
../sysdeps/nptl/libc_start_call_main.h:58
0x7fdf6ca75e3f __libc_start_main_impl
../csu/libc-start.c:392
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See file:///usr/share/doc/gcc-11/README.Bugs for instructions.
make: *** [Makefile:61: whisper.o] Error 1
just to be sure it wasn't my setup, I compiled the fork I have at https://github.com/Topping1/whisper.cpp (2 commits ahead, 8 commits behind) and it compiled fine. For further verification I ran a diff between the two whisper.cpp and found this:
(left is whisper.cpp at my repository, right is the updated one (as of 16-oct-2022)
Do you know what might be causing the issue?
Hi there! I'm trying to save compute resources by reusing WhisperContext
objects in a STT server instance, but if no words were detected in the audio, it will cause whatever words were found in the last transcription that had words detected to be spit out again. This is a major issue, and I'd like a way to prevent this. The easiest way I can think of is adding a function to clear the words stored in the model. I considered adding such a feature to my app, but I realized this could cause serious overhead and introduce user privacy risks from storing many sentences compared to just clearing the words from the model itself. Thanks!
Great work! I find especially the implementation of ggml interesting. It looks like you implement all the basic neural network building blocks with ggml. How do you compare it with the torch jit approach of using a pytorch model in c++?
Language other than English needs to be specified , it doesn't auto detect the language , is it due to missing multilingual support ?
Hi, and thanks so much for this project. It's really, really fast. I've been compiling M1 Mac, Intel Mac, and Windows, and I've noticed something across the board: CMake builds run much, much slower (3-4x) than via Make. I would love to put some time into fixing this and PRing, but I'm really busy right now.
I may have time in a couple of weeks to contribute but just wanted to put this on your radar in case there's some obvious easy fix.
Hello there. Seems like redirecting the standard output with either >
, >>
or tee
doesn't work. Would be nice to have an option to save the output to a specific file.
I am doing some performance optimizations in ggml
and it seems that the PyTorch's Linear layer currently outperforms my implementation by a factor of ~4 for big matrices. I am wondering what is the secret there and if someone can give me some tips how to achieve this performance.
Consider the following line from the original whisper implementation:
https://github.com/openai/whisper/blob/e90b8fa7e845ae184ed9aa0babcf3cde6f16719e/whisper/model.py#L73
This is effectively equivalent to a matrix multiplication of x
with a square weights matrix from the model (encoder.blocks.0.attn.query.weight
) and sum with a bias vector (encoder.blocks.0.attn.query.bias
).
I compared the runtime for this line with an explicit matrix multiplication of same size matrices.
To do that, I replaced the line with this piece of code:
# original
q = self.query(x)
# modified
start = time.time()
q = self.query(x)
print('time for self.query(x) = ', time.time() - start)
start = time.time()
r0 = torch.rand(x.shape[1], x.shape[2], dtype=torch.float32)
r1 = torch.rand(x.shape[2], x.shape[2], dtype=torch.float32)
r2 = r0 @ r1
print('time for r2 (mat_mul) = ', time.time() - start)
print(self.query)
print(' x shape = ', x.shape, ' dtype = ', x.dtype)
print('r0 shape = ', r0.shape, ' dtype = ', r0.dtype)
print('r1 shape = ', r1.shape, ' dtype = ', r1.dtype)
print('r2 shape = ', r2.shape, ' dtype = ', r2.dtype)
I would have expected that time for self.query(x)
to be equal to time for r2 (mat_mul)
.
However, here is the result on my MacBook when running the large
model:
time for self.query(x) = 0.0034177303314208984
time for r2 (mat_mul) = 0.012507200241088867
Linear(in_features=1280, out_features=1280, bias=True)
x shape = torch.Size([1, 1500, 1280]) dtype = torch.float32
r0 shape = torch.Size([1500, 1280]) dtype = torch.float32
r1 shape = torch.Size([1280, 1280]) dtype = torch.float32
r2 shape = torch.Size([1500, 1280]) dtype = torch.float32
So the Linear
layer is almost 4 times faster (3.4 ms
vs 12.5
ms) compared to explicit matrix multiplication.
How do we explain this difference?
Is PyTorch using some int8 quantisation technique under the hood to speed up this layer? If so, how can I verify that this is the case?
Any insight will be very much appreciated!
What is correct parameter for cross compile for ARM Android ? I'm using Intel Ubuntu , android-ndk-r25b
ggml.c:232:16: warning: implicit declaration of function 'vfmaq_f32' is invalid in C99 [-Wimplicit-function-declaration]
sum0 = vfmaq_f32(sum0, x0, y0);
^
ggml.c:232:14: error: assigning to 'float32x4_t' (vector of 4 'float32_t' values) from **incompatible type** 'int'
sum0 = vfmaq_f32(sum0, x0, y0);
^ ~~~~~~~~~~~~~~~~~~~~~~~
./ggml.c:331:14: error: assigning to 'float16x8_t' (vector of 8 'float16_t' values) from **incompatible type** 'int'
sum0 = vfmaq_f16(sum0, x0, y0);
^ ~~~~~~~~~~~~~~~~~~~~~~~
Hey there! I'm testing out whisper.cpp to see if it would be suitable for production use. However I'm running into a SIGFPE on certain audio files: namely those that do not produce any output from the model. Because of the way my system is set up, I'm unable to provide any test files that can reproduce this bug.
However, I was able to build the library with debug symbols and trigger the exception. It seems to be a divide-by-zero error on line 2349 of whisper.cpp:
Line 2349 in 8d94358
The GDB output is as follows:
Thread 21 "scripty_stt_ser" received signal SIGFPE, Arithmetic exception.
[Switching to Thread 0x7ffff7085700 (LWP 3869)]
0x0000555555599123 in whisper_full (ctx=0x5555556f6a80, params=..., samples=<optimized out>, n_samples=<optimized out>) at whisper.cpp:2349
2349 int progress_cur = (100*seek)/whisper_n_len(ctx);
Unfortunately, despite compiling with debug symbols (-g
flag), bt
gave no extra info beyond that:
(gdb) bt
#0 0x0000555555599123 in whisper_full (ctx=0x5555556f6a80, params=..., samples=<optimized out>, n_samples=<optimized out>) at whisper.cpp:2349
#1 0x0000555555593cf6 in whisper_rs::whisper_ctx::WhisperContext::full (self=<optimized out>, params=..., data=...) at src/whisper_ctx.rs:390
Let me know if there's anything else I can do to help!
I am trying to compile for ARM64
and there seems to be an issue with some vector functions:
> [linux/arm64 builder 5/5] RUN gcc -pthread -O3 -march=native -c ggml.c && g++ -pthread -O3 -std=c++11 -c main.cpp && g++ -pthread -o main ggml.o main.o:
#29 3.977 ggml.c:506:14: note: called from here
#29 3.977 506 | y1 = vfmaq_f16(y1, x1, v8);
#29 3.977 | ^~~~~~~~~~~~~~~~~~~~~
#29 3.978 In file included from ggml.c:47:
#29 3.978 /usr/lib/gcc/aarch64-linux-gnu/10/include/arm_neon.h:33208:1: error: inlining failed in call to 'always_inline' 'vfmaq_f16': target specific option mismatch
#29 3.978 33208 | vfmaq_f16 (float16x8_t __a, float16x8_t __b, float16x8_t __c)
#29 3.978 | ^~~~~~~~~
#29 3.978 ggml.c:505:14: note: called from here
#29 3.978 505 | y0 = vfmaq_f16(y0, x0, v8);
#29 3.978 | ^~~~~~~~~~~~~~~~~~~~~
------
Dockerfile:11
--------------------
10 | ADD whisper.cpp/ /build/
11 | >>> RUN gcc -pthread -O3 -march=native -c ggml.c && \
12 | >>> g++ -pthread -O3 -std=c++11 -c main.cpp && \
13 | >>> g++ -pthread -o main ggml.o main.o
14 |
--------------------
ERROR: failed to solve: process "/bin/sh -c gcc -pthread -O3 -march=native -c ggml.c && g++ -pthread -O3 -std=c++11 -c main.cpp && g++ -pthread -o main ggml.o main.o" did not complete successfully: exit code: 1
Tested on GitHub actions (logs) and on a Raspberry Pi 4.
Dockerfile:
# build image
FROM debian:bullseye-slim AS builder
WORKDIR /build/
RUN apt-get update && apt-get install --no-install-recommends -y \
make gcc g++ wget \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
# Install Whisper.cpp
ADD whisper.cpp/ /build/
RUN gcc -pthread -O3 -march=native -c ggml.c && \
g++ -pthread -O3 -std=c++11 -c main.cpp && \
g++ -pthread -o main ggml.o main.o
Hi georgi, I am sure this is not the right platform to make an unreasonable request. Could you make a tutorial or docs how did you went on implementing ggml and especially the design.
I am personally lacking this skill.
Thank you
I'm trying running Japanese audio files through whisper.cpp, and the output is returning some "corrupted" output.
Here is the output from whisper and whisper.cpp for comparison:
Command | Output |
---|---|
whisper output.wav --model large --language Japanese | さくらちゃん**神経もすっごくいいし、バトンもうまいんだけど |
./main -m models/ggml-large.bin -l ja -f output.wav | さくらちゃん**神��もすっごくいいし、バトンもうまいんだけど。 |
The expected 「神経も」 portion is the following in hex:
0xE7A59E 0xE7B58C 0xE38282
The "corrupted" 「神��も」 portion is:
0xE7A59E 0xEFBFBD 0xEEBFBD 0xE38282
Note: I had to comment out a few lines from whisper.cpp around line 2300 for "make" to compile. I do not know if this would impact it.
.beam_search = {
//.n_past = 0,
//.beam_width = 10,
//.n_best = 5,
},
Makefile
main: ggml.o main.o
g++ -pthread -o main ggml.o main.o
./main -h
ggml.o: ggml.c ggml.h
gcc -pthread -O3 -mavx -mavx2 -mfma -mf16c -c ggml.c
main.o: main.cpp ggml.h
g++ -pthread -O3 -std=c++11 -c main.cpp
Hi, I've been trying to get this to work a few times, but it always fails with an illegal hardware instruction error.
E.g. for ./main -m models/ggml-small.bin -f samples/jfk.wav
I get the following output:
whisper_model_load: loading model from 'models/ggml-small.bin'
whisper_model_load: n_vocab = 51865
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 768
whisper_model_load: n_audio_head = 12
whisper_model_load: n_audio_layer = 12
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 768
whisper_model_load: n_text_head = 12
whisper_model_load: n_text_layer = 12
whisper_model_load: n_mels = 80
whisper_model_load: f16 = 1
whisper_model_load: type = 3
whisper_model_load: mem_required = 1048.00 MB
whisper_model_load: adding 1608 extra tokens
whisper_model_load: ggml ctx size = 533.05 MB
fish: Job 1, './main -m models/ggml-small.b...' terminated by signal SIGILL (Illegal instruction)
I've tried other models as well, but the result is always the same.
If you normally try to build stream
with make stream
, it will fail with:
g++ -O3 -std=c++11 -Wall -Wextra -Wno-unused-parameter -Wno-unused-function -pthread stream.cpp ggml.o whisper.o -o stream `sdl2-config --cflags --libs`
/bin/sh: 1: sdl2-config: not found
stream.cpp:12:10: fatal error: SDL.h: No such file or directory
12 | #include <SDL.h>
| ^~~~~~~
compilation terminated.
make: *** [Makefile:76: stream] Error 1
The missing dependency for this is https://www.libsdl.org/ and can be installed with:
sudo apt-get install libsdl2-dev
Would be nice to add this to the README, I might do this later if I have time.
Do you think that could be possible in some way?
I would like to get the time stamp of each word instead the sentence (words bundle).
That could be useful to some kind of karaoke lyrics generator,
or just to text to “lip sync” in a kind of video clip or 3d character synchro.
Cheers
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.