GithubHelp home page GithubHelp logo

abdeladim-s / pywhispercpp Goto Github PK

View Code? Open in Web Editor NEW
127.0 6.0 14.0 1.38 MB

Python bindings for whisper.cpp

Home Page: https://abdeladim-s.github.io/pywhispercpp/

License: MIT License

CMake 5.87% Python 26.08% C++ 67.96% C 0.01% Shell 0.08%
openai-whisper whisper-cpp

pywhispercpp's Issues

Integrating pywhispercpp as the first extension to lollms-webui

Hi Abdeladim. I finally start to write extensions to lollms and I was thinking that first extension should be audio in and audio out. But I need to comply with my rule number 1: Every thing should be done locally. No data is sent anywhere out of your PC.

To do this, I think whisper is really cool. Even cooler is whispercpp, but since I use python, i need pywhispercpp :)

Do you have an example of your code that uses direct input stream from microphone? That would simplify the integration greatly.

Installation from source leads to non-functional installation

How to recreate:

  1. Create a clean python virtualenv in an empty folder
python -m venv .env
  1. Install pywhispercpp from source
pip install git+https://github.com/abdeladim-s/pywhispercpp
  1. Run python interactive shell and import Model
>>> from pywhispercpp.model import Model
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.12/site-packages/pywhispercpp/model.py", line 13, in <module>
    import _pywhispercpp as pw
ImportError: libwhisper.so: cannot open shared object file: No such file or directory

I have not quite wrapped my head around the issue but it seems that when installing a python package like this pip builds the wheel in the process and the libwhisper.so file is included in the python site packages. However the site package directory is by default not included in the LD_LIBRARY_PATH.

However when i manually build the wheel after cloning the repository with python -m build --wheel . and install from the wheel file, libwhisper.so is successfuly included and found on import.

ERROR - unable to initialize from path

Hello.

It cannot be initialized from any path as shown below.

from pywhispercpp.model import Model
model = Model(''/home/user/.local/share/pywhispercpp/models/ggml-large.bin'')

The following error will be output.

:
Invoked with: PosixPath('/home/user/.local/share/pywhispercpp/models/ggml-large.bin')
Segmentation fault (core dumped)

I think can initialize it by setting the Path object to str.
https://github.com/abdeladim-s/pywhispercpp/blob/main/pywhispercpp/model.py#L83

"ggml-metal.metal" file couldn't be found when loading the large-v3 model for CoreML

Hello everyone,
I'm working with an M3 Max and I've tried to load the "ggml-large-v3.bin" model with the following code:

from pywhispercpp.model import Model
model = Model('/Users/my_user/Dev/Models/Whisper_large_v3/ggml-large-v3.bin', n_threads=6)
print(Model.system_info())  # and you should see COREML = 1

But it's unable to find the ggml-metal.metal file when it is actually present in the whisper.cpp folder. It gives me the following result:

[2024-04-30 17:38:52,675] {model.py:221} INFO - Initializing the model ...
AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | METAL = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | CUDA = 0 | COREML = 1 | OPENVINO = 0
whisper_init_from_file_with_params_no_state: loading model from '/Users/my_user/Dev/Models/Whisper_large_v3/ggml-large-v3.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51866
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head  = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1280
whisper_model_load: n_text_head   = 20
whisper_model_load: n_text_layer  = 32
whisper_model_load: n_mels        = 128
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 5 (large v3)
whisper_model_load: adding 1609 extra tokens
whisper_model_load: n_langs       = 100
whisper_backend_init: using Metal backend
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M3 Max
ggml_metal_init: picking default device: Apple M3 Max
ggml_metal_init: default.metallib not found, loading from source
ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil
ggml_metal_init: error: could not use bundle path to find ggml-metal.metal, falling back to trying cwd
ggml_metal_init: loading 'ggml-metal.metal'
ggml_metal_init: error: Error Domain=NSCocoaErrorDomain Code=260 "The file “ggml-metal.metal” couldn’t be opened because there is no such file." UserInfo={NSFilePath=ggml-metal.metal, NSUnderlyingError=0x60000250f690 {Error Domain=NSPOSIXErrorDomain Code=2 "No such file or directory"}}
whisper_backend_init: ggml_backend_metal_init() failed
whisper_model_load:      CPU total size =  3094.36 MB
whisper_model_load: model size    = 3094.36 MB
whisper_backend_init: using Metal backend
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M3 Max
ggml_metal_init: picking default device: Apple M3 Max
ggml_metal_init: default.metallib not found, loading from source
ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil
ggml_metal_init: error: could not use bundle path to find ggml-metal.metal, falling back to trying cwd
ggml_metal_init: loading 'ggml-metal.metal'
ggml_metal_init: error: Error Domain=NSCocoaErrorDomain Code=260 "The file “ggml-metal.metal” couldn’t be opened because there is no such file." UserInfo={NSFilePath=ggml-metal.metal, NSUnderlyingError=0x600002508f00 {Error Domain=NSPOSIXErrorDomain Code=2 "No such file or directory"}}
whisper_backend_init: ggml_backend_metal_init() failed
whisper_init_state: kv self size  =  220.20 MB
whisper_init_state: kv cross size =  245.76 MB
whisper_init_state: loading Core ML model from '/Users/my_user/Dev/Models/Whisper_large_v3/ggml-large-v3-encoder.mlmodelc'
whisper_init_state: first run on a device may take a while ...
whisper_init_state: Core ML model loaded
whisper_init_state: compute buffer (conv)   =   10.92 MB
whisper_init_state: compute buffer (cross)  =    9.38 MB
whisper_init_state: compute buffer (decode) =  209.26 MB

I've tried to add the path to the environment variables with:

export GGML_METAL_PATH_RESOURCES=/Users/gregoiredesauvage/Dev/Modules/pywhispercpp/whisper.cpp/ggml-metal.metal

but it didn't work.

I have the "ggml-large-v3-encoder.mlmodelc" file in the same folder as the "ggml-large-v3.bin" file.

Any idea?

Model class is not supporting relative paths to files

I'm experimenting with your library, and I've noticed that the Model class is not supporting relative paths to files. Here is the traceback.

In [5]: asr_result = model.transcribe("../../audio1470766962.wav")
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[5], line 1
----> 1 asr_result = model.transcribe("../../audio1470766962.wav")

File ~/whisper/lib/python3.10/site-packages/pywhispercpp/model.py:118, in Model.transcribe(self, media, n_processors, new_segment_callback, **params)
    116     media_path = Path(media).absolute()
    117     if not media_path.exists():
--> 118         raise FileNotFoundError(media)
    119     audio = self._load_audio(media_path)
    120 # update params if any

FileNotFoundError: ../../audio1470766962.wav

I assume this is because of the using absolute method from pathlib. If I'm reading the documentation correctly, using resolve method instead of absoulte will resolve (pun intended) the issue 🙂

Here is the example

In [3]: pathlib.Path('../audio1470766962.wav').absolute()
Out[3]: PosixPath('/Users/guschin/whisper/../audio1470766962.wav')

In [4]: pathlib.Path('../audio1470766962.wav').resolve()
Out[4]: PosixPath('/Users/guschin/audio1470766962.wav')

Would you consider this change, please? I can send you a PR.

Tool is super slow / runs forever

I'm trying to transcribe the audio of a 45s mp3 of the audio of a YouTube Short.
I'm doing it like this:

from pywhispercpp.model import Model
model = Model('base.en', print_realtime=False, print_progress=True, n_threads=6)
segments = model.transcribe(short_audio_file, speed_up=True, new_segment_callback=print)

It runs forever, doesn't end and this is all the output I get. Then it just keeps running for seemingly nothing. CPU is at 100%:

[2024-01-09 23:28:50,941] {utils.py:38} INFO - No download directory was provided, models will be downloaded to [/home/marius/.local/share/pywhispercpp/models](https://file+.vscode-resource.vscode-cdn.net/home/marius/.local/share/pywhispercpp/models)
[2024-01-09 23:28:50,943] {utils.py:46} INFO - Model base.en already exists in [/home/marius/.local/share/pywhispercpp/models](https://file+.vscode-resource.vscode-cdn.net/home/marius/.local/share/pywhispercpp/models)
[2024-01-09 23:28:50,944] {model.py:221} INFO - Initializing the model ...
whisper_init_from_file_no_state: loading model from '/home/marius/.local/share/pywhispercpp/models/ggml-base.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: type          = 2
whisper_model_load: mem required  =  310.00 MB (+    6.00 MB per decoder)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx     =  140.60 MB
whisper_model_load: model size    =  140.54 MB
whisper_init_state: kv self size  =    5.25 MB
whisper_init_state: kv cross size =   17.58 MB
[2024-01-09 23:28:52,186] {model.py:130} INFO - Transcribing ...

Any ideas what could be wrong or how to improve the speed? Thanks for any help. I appreciate it. This is the most promising of the python bindings for whisper.cpp as the others don't even build anymore...

pywhispercpp/whisper.cpp/ggml-opencl.c:4:10: fatal error: 'clblast_c.h' file not found #include <clblast_c.h>

  1. My Mac M2;
  2. clone the code and download whisper.cpp repository.
  3. run the command: cd pywhispercpp && python setup.py install
Building C object CMakeFiles/_pywhispercpp.dir/whisper.cpp/ggml-opencl.c.o
/Users/diaojunxian/Documents/github/pywhispercpp/whisper.cpp/ggml-opencl.c:4:10: fatal error: 'clblast_c.h' file not found
#include <clblast_c.h>
         ^~~~~~~~~~~~~
1 error generated.
make[2]: *** [CMakeFiles/_pywhispercpp.dir/whisper.cpp/ggml-opencl.c.o] Error 1
make[1]: *** [CMakeFiles/_pywhispercpp.dir/all] Error 2
make: *** [all] Error 2
Traceback (most recent call last):
  File "/Users/diaojunxian/Documents/github/pywhispercpp/setup.py", line 132, in <module>
    setup(
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/__init__.py", line 87, in setup
    return distutils.core.setup(**attrs)
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 185, in setup
    return run_commands(dist)
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
    dist.run_commands()
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
    self.run_command(cmd)
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/dist.py", line 1208, in run_command
    super().run_command(command)
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/command/install.py", line 74, in run
    self.do_egg_install()
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/command/install.py", line 123, in do_egg_install
    self.run_command('bdist_egg')
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
    self.distribution.run_command(command)
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/dist.py", line 1208, in run_command
    super().run_command(command)
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/command/bdist_egg.py", line 165, in run
    cmd = self.call_command('install_lib', warn_dir=0)
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/command/bdist_egg.py", line 151, in call_command
    self.run_command(cmdname)
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
    self.distribution.run_command(command)
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/dist.py", line 1208, in run_command
    super().run_command(command)
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/command/install_lib.py", line 11, in run
    self.build()
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/_distutils/command/install_lib.py", line 112, in build
    self.run_command('build_ext')
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
    self.distribution.run_command(command)
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/dist.py", line 1208, in run_command
    super().run_command(command)
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 84, in run
    _build_ext.run(self)
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 346, in run
    self.build_extensions()
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 468, in build_extensions
    self._build_extensions_serial()
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 494, in _build_extensions_serial
    self.build_extension(ext)
  File "/Users/diaojunxian/Documents/github/pywhispercpp/setup.py", line 121, in build_extension
    subprocess.run(
  File "/Users/diaojunxian/anaconda3/envs/3.9/lib/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['cmake', '--build', '.']' returned non-zero exit status 2.
  1. but when run the command cd pywhispercpp/whisper.cpp && make clean && make , it means success.
 make clean && make
[  8%] Building C object CMakeFiles/whisper.dir/ggml.c.o
[ 16%] Building CXX object CMakeFiles/whisper.dir/whisper.cpp.o
[ 25%] Linking CXX shared library libwhisper.dylib
[ 25%] Built target whisper
[ 33%] Building CXX object examples/CMakeFiles/common.dir/common.cpp.o
[ 41%] Building CXX object examples/CMakeFiles/common.dir/common-ggml.cpp.o
[ 50%] Linking CXX static library libcommon.a
[ 50%] Built target common
[ 58%] Building CXX object examples/main/CMakeFiles/main.dir/main.cpp.o
[ 66%] Linking CXX executable ../../bin/main
[ 66%] Built target main
[ 75%] Building CXX object examples/bench/CMakeFiles/bench.dir/bench.cpp.o
[ 83%] Linking CXX executable ../../bin/bench
[ 83%] Built target bench
[ 91%] Building CXX object examples/quantize/CMakeFiles/quantize.dir/quantize.cpp.o
/Users/diaojunxian/Documents/github/pywhispercpp/whisper.cpp/examples/quantize/quantize.cpp:112:29: warning: cast from 'const int *' to 'char *' drops const qualifier [-Wcast-qual]
        fout.write((char *) &ftype_dst,             sizeof(hparams.ftype));
                            ^
/Users/diaojunxian/Documents/github/pywhispercpp/whisper.cpp/examples/quantize/quantize.cpp:148:33: warning: cast from 'const char *' to 'char *' drops const qualifier [-Wcast-qual]
            finp.read ((char *) word.data(), len);
                                ^
/Users/diaojunxian/Documents/github/pywhispercpp/whisper.cpp/examples/quantize/quantize.cpp:149:33: warning: cast from 'const char *' to 'char *' drops const qualifier [-Wcast-qual]
            fout.write((char *) word.data(), len);
                                ^
3 warnings generated.
[100%] Linking CXX executable ../../bin/quantize
[100%] Built target quantize

_pywhispercpp module could not be found

Just did a standard PyPi download in my venv as per

pip install pywhispercpp

A standard script with:

import pywhispercpp.model as m

modelPath: str = ...
filePath: str = ...
outputPath: str = ...

model = m.Model('modelPath', n_threads=6)
segments = model.transcribe(filePath, token_timestamps=True, max_len=1)

with open(outputPath, 'w') as file:
    for segment in segments:
        file.write(segment.text + '\n')

Is failing with error:

Traceback (most recent call last):
  File "...\whisper_file.py", line 1, in <module>
    import pywhispercpp.model as m
  File "...\model.py", line 13, in <module>
    import _pywhispercpp as pw
ImportError: DLL load failed while importing _pywhispercpp: The specified module could not be found.

For reference, FFMpeg is installed:

╰─ ffmpeg -version                                                                                                   ─╯
ffmpeg version 4.4-essentials_build-www.gyan.dev Copyright (c) 2000-2021 the FFmpeg developers
built with gcc 10.2.0 (Rev6, Built by MSYS2 project)
configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-lzma --enable-zlib --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-sdl2 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-libass --enable-libfreetype --enable-libfribidi --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-d3d11va --enable-dxva2 --enable-libmfx --enable-libgme --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libtheora --enable-libvo-amrwbenc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-librubberband
libavutil      56. 70.100 / 56. 70.100
libavcodec     58.134.100 / 58.134.100
libavformat    58. 76.100 / 58. 76.100
libavdevice    58. 13.100 / 58. 13.100
libavfilter     7.110.100 /  7.110.100
libswscale      5.  9.100 /  5.  9.100
libswresample   3.  9.100 /  3.  9.100
libpostproc    55.  9.100 / 55.  9.100

failed to compute log mel spectrogram

Hi,

I'm using M3 Max, and I built with CoreML support. When I run transcribe, it throws an error: "failed to compute log mel spectrogram."
I'm including the log below.
I'd appreciate any help! Thanks so much!

>>> from pywhispercpp.model import Model
>>> model = Model('models/ggml-medium.bin', n_threads=6)
[2024-05-20 07:48:47,691] {model.py:221} INFO - Initializing the model ...
whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-medium.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1024
whisper_model_load: n_audio_head  = 16
whisper_model_load: n_audio_layer = 24
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1024
whisper_model_load: n_text_head   = 16
whisper_model_load: n_text_layer  = 24
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 4 (medium)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
whisper_backend_init: using Metal backend
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M3 Max
ggml_metal_init: picking default device: Apple M3 Max
ggml_metal_init: default.metallib not found, loading from source
ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil
ggml_metal_init: error: could not use bundle path to find ggml-metal.metal, falling back to trying cwd
ggml_metal_init: loading 'ggml-metal.metal'
ggml_metal_init: error: Error Domain=MTLLibraryErrorDomain Code=3 "program_source:3:10: fatal error: 'ggml-common.h' file not found
#include "ggml-common.h"
         ^~~~~~~~~~~~~~~
" UserInfo={NSLocalizedDescription=program_source:3:10: fatal error: 'ggml-common.h' file not found
#include "ggml-common.h"
         ^~~~~~~~~~~~~~~
}
whisper_backend_init: ggml_backend_metal_init() failed
whisper_model_load:      CPU total size =  1533.14 MB
whisper_model_load: model size    = 1533.14 MB
whisper_init_state: kv self size  =  150.99 MB
whisper_init_state: kv cross size =  150.99 MB
whisper_init_state: kv pad  size  =    6.29 MB
whisper_init_state: loading Core ML model from 'models/ggml-medium-encoder.mlmodelc'
whisper_init_state: first run on a device may take a while ...
whisper_init_state: Core ML model loaded
whisper_init_state: compute buffer (conv)   =    8.81 MB
whisper_init_state: compute buffer (cross)  =    7.85 MB
whisper_init_state: compute buffer (decode) =  142.09 MB
>>> print(Model.system_info())
AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | METAL = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | CUDA = 0 | COREML = 1 | OPENVINO = 0
>>> segments = model.transcribe(file, speed_up=True)
[2024-05-20 07:49:11,958] {model.py:130} INFO - Transcribing ...
whisper_full_with_state: failed to compute log mel spectrogram
[2024-05-20 07:49:11,958] {model.py:133} INFO - Inference time: 0.000 s

How to make transcription and speaker diarization using pywhispercpp

Hello,

I am interested in using pywhispercpp for speech recognition and speaker diarization.

I have installed the library and followed the instructions in the README file, but I am not sure how to use it for my use case.

Could you please provide some guidance or examples on how to make transcription and speaker diarization using pywhispercpp?

Note: I'm using google colab.

Thank you.

Using the agent for interacting with ollama models

thank you for simplifying programatic access to whisper.cpp. I really appreciate your kind gift to the community. Please forgive my question, however. I can't seem to figure out how to call ollama from your agent module. I assume I need to modify the callback parameter and use langchain's LLM module's ollama function, but I can't find any example code. Will you be publishing example code? The documentation seems to suggest you will. I would very much appreciate some direction. Thank you again for sharing your wonderful code.

"Cannot find source file: ggml.h" when trying to install on Ubuntu 22.04 on aarch64

Would appreciate some help on this.
I'm not sure why are some files are missing when trying to build the wheel.

This is an Oracle Cloud Free tier instance.
VM.Standard.A1.Flex (Arm processor from Ampere) - 4 CPU, 24 GB RAM.

ubuntu@server1:~$ pip install pywhispercpp
Defaulting to user installation because normal site-packages is not writeable
Collecting pywhispercpp
  Using cached pywhispercpp-1.0.8.tar.gz (229 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting pydub
  Using cached pydub-0.25.1-py2.py3-none-any.whl (32 kB)
Requirement already satisfied: platformdirs in /usr/lib/python3/dist-packages (from pywhispercpp) (2.5.1)
Requirement already satisfied: tqdm in ./.local/lib/python3.10/site-packages (from pywhispercpp) (4.64.1)
Requirement already satisfied: numpy in ./.local/lib/python3.10/site-packages (from pywhispercpp) (1.24.2)
Requirement already satisfied: requests in /usr/lib/python3/dist-packages (from pywhispercpp) (2.25.1)
Building wheels for collected packages: pywhispercpp
  Building wheel for pywhispercpp (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building wheel for pywhispercpp (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [109 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.linux-aarch64-3.10
      creating build/lib.linux-aarch64-3.10/pywhispercpp
      copying ./pywhispercpp/_logger.py -> build/lib.linux-aarch64-3.10/pywhispercpp
      copying ./pywhispercpp/__init__.py -> build/lib.linux-aarch64-3.10/pywhispercpp
      copying ./pywhispercpp/constants.py -> build/lib.linux-aarch64-3.10/pywhispercpp
      copying ./pywhispercpp/utils.py -> build/lib.linux-aarch64-3.10/pywhispercpp
      copying ./pywhispercpp/model.py -> build/lib.linux-aarch64-3.10/pywhispercpp
      creating build/lib.linux-aarch64-3.10/pywhispercpp/examples
      copying ./pywhispercpp/examples/main.py -> build/lib.linux-aarch64-3.10/pywhispercpp/examples
      copying ./pywhispercpp/examples/assistant.py -> build/lib.linux-aarch64-3.10/pywhispercpp/examples
      copying ./pywhispercpp/examples/__init__.py -> build/lib.linux-aarch64-3.10/pywhispercpp/examples
      copying ./pywhispercpp/examples/recording.py -> build/lib.linux-aarch64-3.10/pywhispercpp/examples
      copying ./pywhispercpp/examples/livestream.py -> build/lib.linux-aarch64-3.10/pywhispercpp/examples
      running build_ext
      -- The C compiler identification is GNU 11.3.0
      -- The CXX compiler identification is GNU 11.3.0
      -- Detecting C compiler ABI info
      -- Detecting C compiler ABI info - done
      -- Check for working C compiler: /usr/bin/cc - skipped
      -- Detecting C compile features
      -- Detecting C compile features - done
      -- Detecting CXX compiler ABI info
      -- Detecting CXX compiler ABI info - done
      -- Check for working CXX compiler: /usr/bin/c++ - skipped
      -- Detecting CXX compile features
      -- Detecting CXX compile features - done
      -- pybind11 v2.9.2
      -- Found PythonInterp: /usr/bin/python3 (found version "3.10.6")
      -- Found PythonLibs: /usr/lib/aarch64-linux-gnu/libpython3.10.so
      -- Performing Test HAS_FLTO
      -- Performing Test HAS_FLTO - Success
      -- Looking for pthread.h
      -- Looking for pthread.h - found
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
      -- Found Threads: TRUE
      -- CMAKE_SYSTEM_PROCESSOR: aarch64
      -- ARM detected
      -- CMAKE_SYSTEM_PROCESSOR: aarch64
      -- ARM detected
      -- Configuring done
      CMake Error at whisper.cpp/CMakeLists.txt:190 (add_library):
        Cannot find source file:

          ggml.h

        Tried extensions .c .C .c++ .cc .cpp .cxx .cu .mpp .m .M .mm .ixx .cppm .h
        .hh .h++ .hm .hpp .hxx .in .txx .f .F .for .f77 .f90 .f95 .f03 .hip .ispc


      CMake Error at whisper.cpp/CMakeLists.txt:190 (add_library):
        No SOURCES given to target: whisper


      CMake Generate step failed.  Build files cannot be regenerated correctly.
      Traceback (most recent call last):
        File "/usr/lib/python3/dist-packages/pip/_vendor/pep517/in_process/_in_process.py", line 363, in <module>
          main()
        File "/usr/lib/python3/dist-packages/pip/_vendor/pep517/in_process/_in_process.py", line 345, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/usr/lib/python3/dist-packages/pip/_vendor/pep517/in_process/_in_process.py", line 261, in build_wheel
          return _build_backend().build_wheel(wheel_directory, config_settings,
        File "/usr/lib/python3/dist-packages/setuptools/build_meta.py", line 230, in build_wheel
          return self._build_with_temp_dir(['bdist_wheel'], '.whl',
        File "/usr/lib/python3/dist-packages/setuptools/build_meta.py", line 215, in _build_with_temp_dir
          self.run_setup()
        File "/usr/lib/python3/dist-packages/setuptools/build_meta.py", line 158, in run_setup
          exec(compile(code, __file__, 'exec'), locals())
        File "setup.py", line 132, in <module>
          setup(
        File "/usr/lib/python3/dist-packages/setuptools/__init__.py", line 153, in setup
          return distutils.core.setup(**attrs)
        File "/usr/lib/python3/dist-packages/setuptools/_distutils/core.py", line 148, in setup
          return run_commands(dist)
        File "/usr/lib/python3/dist-packages/setuptools/_distutils/core.py", line 163, in run_commands
          dist.run_commands()
        File "/usr/lib/python3/dist-packages/setuptools/_distutils/dist.py", line 967, in run_commands
          self.run_command(cmd)
        File "/usr/lib/python3/dist-packages/setuptools/_distutils/dist.py", line 986, in run_command
          cmd_obj.run()
        File "/usr/lib/python3/dist-packages/wheel/bdist_wheel.py", line 299, in run
          self.run_command('build')
        File "/usr/lib/python3/dist-packages/setuptools/_distutils/cmd.py", line 313, in run_command
          self.distribution.run_command(command)
        File "/usr/lib/python3/dist-packages/setuptools/_distutils/dist.py", line 986, in run_command
          cmd_obj.run()
        File "/usr/lib/python3/dist-packages/setuptools/_distutils/command/build.py", line 135, in run
          self.run_command(cmd_name)
        File "/usr/lib/python3/dist-packages/setuptools/_distutils/cmd.py", line 313, in run_command
          self.distribution.run_command(command)
        File "/usr/lib/python3/dist-packages/setuptools/_distutils/dist.py", line 986, in run_command
          cmd_obj.run()
        File "/usr/lib/python3/dist-packages/setuptools/command/build_ext.py", line 79, in run
          _build_ext.run(self)
        File "/usr/lib/python3/dist-packages/setuptools/_distutils/command/build_ext.py", line 339, in run
          self.build_extensions()
        File "/usr/lib/python3/dist-packages/setuptools/_distutils/command/build_ext.py", line 448, in build_extensions
          self._build_extensions_serial()
        File "/usr/lib/python3/dist-packages/setuptools/_distutils/command/build_ext.py", line 473, in _build_extensions_serial
          self.build_extension(ext)
        File "setup.py", line 118, in build_extension
          subprocess.run(
        File "/usr/lib/python3.10/subprocess.py", line 524, in run
          raise CalledProcessError(retcode, process.args,
      subprocess.CalledProcessError: Command '['cmake', '/tmp/pip-install-0xv4591a/pywhispercpp_c5e26fcae91046c186dddac942177d54', '-DCMAKE_LIBRARY_OUTPUT_DIRECTORY=/tmp/pip-install-0xv4591a/pywhispercpp_c5e26fcae91046c186dddac942177d54/build/lib.linux-aarch64-3.10/', '-DPYTHON_EXECUTABLE=/usr/bin/python3', '-DCMAKE_BUILD_TYPE=Release', '-DEXAMPLE_VERSION_INFO=1.0.8', '-GNinja', '-DCMAKE_MAKE_PROGRAM:FILEPATH=/tmp/pip-build-env-mt4f3311/overlay/local/lib/python3.10/dist-packages/ninja/data/bin/ninja']' returned non-zero exit status 1.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for pywhispercpp
Failed to build pywhispercpp
ERROR: Could not build wheels for pywhispercpp, which is required to install pyproject.toml-based projects

word-level timestamps?

Hi - thanks for making this. I was trying to get word-level timestamps, but haven't been able to figure out how to. Any tips? Thanks again!

Unable to load `quantized` models

I am trying to load whisper tiny quantized version ( ggml-tiny-q5_1.bin)
i trying this using jupyter kernal in visual studio code
But when i try to run
model = Model('./models/quantized/ggml-tiny-q5_1.bin', print_progress=False)
Kernal is dying ( unable to load quantized model ) remaining unquantized models are working file seems to be issue in quantized version

image

Kindly need someone help !!!

Unable to install on raspberry pi 4

Hello,

The original code by ggerganov works with the raspberry pi as well, I was hoping a python wrapper could also work with it.

Currently when I run pip install pywhispercpp I get a build error:

exit status 1.

ERROR: FAILED building wheel for pywhispercpp
Failed to build pywhispercpp
ERROR: Could not build wheels for pywhispercpp, which is required to install pyproject.toml-based projects

How to add space between subtitles?

Hello. There is no space between two sentences when using this model. In other words, when the speaker finishes the sentence, the subtitle is still shown. I just want it to be displayed only when speaker is speaking. But subtitles always appear. What setting should I change?

ERROR - Invalid model name `./model.bin`

Hello.
I want create my own assistant so I downloaded the assistant example in assistant.py

from assistant import Assistant

def file(text):
    with open("text.txt","a") as f:
        f.write(text+"\n")

hope = Assistant(commands_callback=file,model="./model.bin")
hope.start()

I want to use my own model. But when I give the Assitant class its path. I get this error :

[2023-08-17 14:48:31,033] {utils.py:34} ERROR - Invalid model name `./model.bin`, available models are: ['tiny.en', 'tiny', 'base.en', 'base', 'small.en', 'small', 'medium.en', 'medium', 'large-v1', 'large']
[2023-08-17 14:48:31,034] {model.py:221} INFO - Initializing the model ...
whisper_init_from_file_no_state: loading model from '(null)'

Why this error ? And how to solve it ?

Thanks in advance.

Unknown language error

When passing in a language parameter, it doesn't seem to translate the string over properly.

Example:

  • run pwcpp ./some_audio.wav --language "es" -m base --print_realtime true

Error output:
whisper_lang_id: unknown language '\�'

image

Nothing happens

Hello, I'm using version 1.1.1 of the pywhispercpp library and when I try to run the code, nothing happens. I've tried using the CLI and the same error persists. Also, when the library breaks the other whisper variables and I need to uninstall it to get it working again.

Is there anything I can do to resolve it?

I'll attach a video showing what appears when I run the code.

2023-07-16.20-43-28.mp4

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.