davabase / whisper_real_time Goto Github PK
View Code? Open in Web Editor NEWReal time transcription with OpenAI Whisper.
Real time transcription with OpenAI Whisper.
Hello,
First of all : nice work! Your code has been very useful to me 💯
There is just one little problem I think: the sleep instruction only executes if the data queue is not empty.
while True:
...
if not data_queue.empty():
...
sleep(0.25)
I may be wrong, but it seems to me that an indentation level should be removed on the sleep()
call to prevent the infinite loop spam when the data queue is empty?
For example: If I say I am XYZ. It takes almost above mentioned time to deliver it. How to speed this thing up?
Also, Why does it take unexpected time to load the model?
First I installed pyenv by running brew install pyenv
then pyenv install 3.8
python 3.7 didn't worked for me so :(
Created a venv and installed ffmpeg and portaudio (required for pyaudio)
brew install ffmpeg
and
brew install portaudio
and finally
pip install -r requirements.txt
The code then worked! :)
Hi all,
Thank you for this implementation.
I would like to transcribe from the soundcard, so I would need to specify here a different source.
This is the list of my mic devices:
Microphone with name "MacBook Pro Microphone" found for `Microphone(device_index=1)`
Microphone with name "MacBook Pro Speakers" found for `Microphone(device_index=2)`
So I am adding:
source = sr.Microphone(sample_rate=16000, device_index=2)
but I get the following error:
Traceback (most recent call last):
File "transcribe_demo_soundcard.py", line 79, in main
recorder.adjust_for_ambient_noise(source)
File "/opt/anaconda3/envs/py38/lib/python3.8/site-packages/speech_recognition/__init__.py", line 383, in adjust_for_ambient_noise
assert source.stream is not None, "Audio source must be entered before adjusting, see documentation for ``AudioSource``; are you using ``source`` outside of a ``with`` statement?"
AssertionError: Audio source must be entered before adjusting, see documentation for ``AudioSource``; are you using ``source`` outside of a ``with`` statement?
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "transcribe_demo_soundcard.py", line 155, in <module>
main()
File "transcribe_demo_soundcard.py", line 79, in main
recorder.adjust_for_ambient_noise(source)
File "/opt/anaconda3/envs/py38/lib/python3.8/site-packages/speech_recognition/__init__.py", line 189, in __exit__
self.stream.close()
AttributeError: 'NoneType' object has no attribute 'close'
Any clue why?
Thanks!
Did somebody managed to let it work on Windows? I couldn't in any way.
The source is loaded correctly, but basically debugging, no data is put on queue:
data = audio.get_raw_data()
data_queue.put(data)
So no any result during listening.
Which way did you solve it?
Maybe I'm missing the obvious, is there a way to export the transcription into a text file/log? Also are there other arguments that can be inserted to date/time stamp each entry?
delete this issue
Traceback (most recent call last):
File "C:\Users\UsernameHere\Desktop\PythonProjects\real-time-whisper\whisper_real_time-master\transcribe_demo3.py", line 158, in <module>
main()
File "C:\Users\UsernameHere\Desktop\PythonProjects\real-time-whisper\whisper_real_time-master\transcribe_demo3.py", line 130, in main
result = audio_model.transcribe(temp_file, fp16=torch.cuda.is_available())
File "C:\Users\UsernameHere\AppData\Local\Programs\Python\Python311\Lib\site-packages\whisper\transcribe.py", line 121, in transcribe
mel = log_mel_spectrogram(audio, padding=N_SAMPLES)
File "C:\Users\UsernameHere\AppData\Local\Programs\Python\Python311\Lib\site-packages\whisper\audio.py", line 140, in log_mel_spectrogram
audio = load_audio(audio)
File "C:\Users\UsernameHere\AppData\Local\Programs\Python\Python311\Lib\site-packages\whisper\audio.py", line 59, in load_audio
out = run(cmd, capture_output=True, check=True).stdout
File "C:\Users\UsernameHere\AppData\Local\Programs\Python\Python311\Lib\subprocess.py", line 548, in run
with Popen(*popenargs, **kwargs) as process:
File "C:\Users\UsernameHere\AppData\Local\Programs\Python\Python311\Lib\subprocess.py", line 1024, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Users\UsernameHere\AppData\Local\Programs\Python\Python311\Lib\subprocess.py", line 1510, in _execute_child
# no special security
FileNotFoundError: [WinError 2] The system cannot find the file specified
After troubleshooting, this indicates that FFMPEG is not found or not installed correctly. It should have been installed correctly via requirements.txt.
One dirty and quick mitigation is to download ffmpeg.exe, and add it to your environmental variables "PATH" variable for windows users.
i dont have a gpu so i a running this on cpu, but for testing i said these words,
"Hello,hello,hello
this. is just testing. Please give me everything that is said.
Thank you."
it either only prints hello, hello, hello or this is testing.
when i run the app, it seems dont't work at all. did i miss some arguments or something else?
the command i run was: "python transcribe_demo.py", but it seems the app was wait arguments or something else when i press enter key
python == 3.9
torch == 2.0.1
cuda == 11.8
using a virtual environment
To allow individual speakers' dialogue to be partitioned.
Hi,
whisper_real_time works for english and hindi..But I couldnt get it to work for malayalam.
Even whisper is not working for malayalam
Heres the code section
model = whisper.load_model("medium")
result = model.transcribe("/home/ajay/pcs/whisper_real_time/stackoverflow.wav",language='ml')
I am fairly sure that this model is capable of translating non-english spoken language into english text. I think maybe we are missing a parameter perhaps? how can we make this translate non_english speech into english text?
I can't find any clues in how to use specific language model.
In VOSK API or GOOGLE API all models file are language specific, but on this WHISPER API there are only tiny, small, or big model without any language specified.
Is there any how to control model used based on language specified?
ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_route.c:869:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:869:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:869:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:869:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_oss.c:377:(_snd_pcm_oss_open) Unknown field port
ALSA lib pcm_oss.c:377:(_snd_pcm_oss_open) Unknown field port
ALSA lib pcm_usb_stream.c:486:(_snd_pcm_usb_stream_open) Invalid type for card
ALSA lib pcm_usb_stream.c:486:(_snd_pcm_usb_stream_open) Invalid type for card
ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_route.c:869:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:869:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:869:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:869:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_oss.c:377:(_snd_pcm_oss_open) Unknown field port
ALSA lib pcm_oss.c:377:(_snd_pcm_oss_open) Unknown field port
ALSA lib pcm_usb_stream.c:486:(_snd_pcm_usb_stream_open) Invalid type for card
ALSA lib pcm_usb_stream.c:486:(_snd_pcm_usb_stream_open) Invalid type for card
ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_route.c:869:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:869:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:869:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:869:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_oss.c:377:(_snd_pcm_oss_open) Unknown field port
ALSA lib pcm_oss.c:377:(_snd_pcm_oss_open) Unknown field port
ALSA lib pcm_usb_stream.c:486:(_snd_pcm_usb_stream_open) Invalid type for card
ALSA lib pcm_usb_stream.c:486:(_snd_pcm_usb_stream_open) Invalid type for card
Model loaded.
ALSA lib pcm_dsnoop.c:641:(snd_pcm_dsnoop_open) unable to open slave
ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_route.c:869:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:869:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:869:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_oss.c:377:(_snd_pcm_oss_open) Unknown field port
ALSA lib pcm_oss.c:377:(_snd_pcm_oss_open) Unknown field port
ALSA lib pcm_usb_stream.c:486:(_snd_pcm_usb_stream_open) Invalid type for card
ALSA lib pcm_usb_stream.c:486:(_snd_pcm_usb_stream_open) Invalid type for card
Traceback (most recent call last):
File "c:\Users\laugh\OneDrive\Documents\GitHub\Dobby\base2.py", line 152, in
main()
File "c:\Users\laugh\OneDrive\Documents\GitHub\Dobby\base2.py", line 69, in main
audio_model = whisper.load_model(model)
File "C:\Users\laugh\AppData\Local\Programs\Python\Python39\lib\site-packages\whisper_init_.py", line 154, in load_model
return model.to(device)
File "C:\Users\laugh\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py", line 1149, in to
return self._apply(convert)
File "C:\Users\laugh\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py", line 801, in _apply
module._apply(fn)
File "C:\Users\laugh\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py", line 801, in _apply
module._apply(fn)
File "C:\Users\laugh\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py", line 801, in _apply
module._apply(fn)
[Previous line repeated 2 more times]
File "C:\Users\laugh\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py", line 824, in apply
param_applied = fn(param)
File "C:\Users\laugh\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py", line 1147, in convert
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 MiB. GPU 0 has a total capacty of 4.00 GiB of which 0 bytes is free. Of the allocated memory 3.44 GiB is allocated by PyTorch,
and 15.11 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
PS C:\Users\laugh\OneDrive\Documents\GitHub\Dobby> & C:/Users/laugh/AppData/Local/Programs/Python/Python39/python.exe c:/Users/laugh/OneDrive/Documents/GitHub/Dobby/base2.py
Traceback (most recent call last):
File "c:\Users\laugh\OneDrive\Documents\GitHub\Dobby\base2.py", line 152, in
main()
File "c:\Users\laugh\OneDrive\Documents\GitHub\Dobby\base2.py", line 69, in main
audio_model = whisper.load_model(model)
File "C:\Users\laugh\AppData\Local\Programs\Python\Python39\lib\site-packages\whisper_init.py", line 154, in load_model
return model.to(device)
File "C:\Users\laugh\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py", line 1149, in to
return self._apply(convert)
File "C:\Users\laugh\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py", line 801, in _apply
module._apply(fn)
File "C:\Users\laugh\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py", line 801, in _apply
module._apply(fn)
File "C:\Users\laugh\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py", line 801, in _apply
module._apply(fn)
[Previous line repeated 2 more times]
File "C:\Users\laugh\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py", line 824, in _apply
param_applied = fn(param)
File "C:\Users\laugh\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py", line 1147, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 MiB. GPU 0 has a total capacty of 4.00 GiB of which 0 bytes is free. Of the allocated memory 3.44 GiB is allocated by PyTorch,
and 15.11 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
follow your codes,get follow errors:
codes:
from transformers import pipeline
import sys
import time
from tempfile import NamedTemporaryFile
transcriber = pipeline(task="automatic-speech-recognition", model="openai/whisper-large",chunk_length_s = 30, device=0)
starttime = time.time()
audiopath = sys.argv[1]
wf = open(audiopath, "rb")
#wf.read(44) # skip header
temp_file = NamedTemporaryFile().name
while True:
data = wf.read(16000)
if len(data) == 0:
break
with open(temp_file+".wav", 'w+b') as f:
f.write(data)
text = transcriber(temp_file+".wav")['text']
print(text)
#print(text)
endtime=time.time()
print("it takes {}".format(endtime-starttime))
error:
Traceback (most recent call last):
File "test2_stream.py", line 18, in <module>
text = transcriber(temp_file+".wav")['text']
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/pipelines/automatic_speech_recognition.py", line 378, in __call__
return super().__call__(inputs, **kwargs)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/pipelines/base.py", line 1076, in __call__
return next(
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/pipelines/pt_utils.py", line 124, in __next__
item = next(self.iterator)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/pipelines/pt_utils.py", line 266, in __next__
processed = self.infer(next(self.iterator), **self.params)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
data = self._next_data()
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 561, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 32, in fetch
data.append(next(self.dataset_iter))
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/pipelines/pt_utils.py", line 183, in __next__
processed = next(self.subiterator)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/pipelines/automatic_speech_recognition.py", line 437, in preprocess
inputs = ffmpeg_read(inputs, self.feature_extractor.sampling_rate)
File "/home/ybZhang/miniconda3/envs/whister/lib/python3.8/site-packages/transformers/pipelines/audio_utils.py", line 41, in ffmpeg_read
raise ValueError("Malformed soundfile")
ValueError: Malformed soundfile
Traceback (most recent call last):
File "transcribe_demo.py", line 151, in
main()
File "transcribe_demo.py", line 69, in main
audio_model = whisper.load_model(model)
File "D:\anaconda3\envs\whisperTime\lib\site-packages\whisper_init_.py", line 122, in load_model
return model.to(device)
File "D:\anaconda3\envs\whisperTime\lib\site-packages\torch\nn\modules\module.py", line 989, in to
return self._apply(convert)
File "D:\anaconda3\envs\whisperTime\lib\site-packages\torch\nn\modules\module.py", line 641, in _apply
module._apply(fn)
File "D:\anaconda3\envs\whisperTime\lib\site-packages\torch\nn\modules\module.py", line 641, in _apply
module._apply(fn)
File "D:\anaconda3\envs\whisperTime\lib\site-packages\torch\nn\modules\module.py", line 641, in _apply
module._apply(fn)
[Previous line repeated 2 more times]
File "D:\anaconda3\envs\whisperTime\lib\site-packages\torch\nn\modules\module.py", line 664, in _apply
param_applied = fn(param)
File "D:\anaconda3\envs\whisperTime\lib\site-packages\torch\nn\modules\module.py", line 987, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 26.00 MiB (GPU 0; 8.00 GiB total capacity; 6.50 GiB already allocated; 0 bytes free; 6.83 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for M
emory Management and PYTORCH_CUDA_ALLOC_CONF
Some text is replaced by later parts of the sentence,
generally where there should be a comma.
I said: please let me know where I can find green apples
And after different text appearing the only text left was :
green apples
great work btw!
As the tittle says, did you find any optimal values for VAD filter?
Hi, I wrote a hook that allows sound data from any application with active sound(or several applications simultaneously) together with the mic data, to be streamed to Whisper. I'm using it for application-agnostic live-transcribtion/LLM "real time" assistance application.
sr.AudioSource
abstract class, with audio stream from PulseAudio, what allowed me easily connect to whisper and enjoy all the sr
features like background listening, sound adjustment etc..Now I'm ready to push the code, and I wonder if(and how) should I address you, or should I create a PR to add my hook, after I'll prettify and test it.
Thanks for sharing your code, it's the best I tried for real time Whisper usage.
I used this code in my project. when selecting "1" in the menu, this code is run. how do I release the microphone fromspeech recognition? i try this:
except KeyboardInterrupt:
source.stream.pyaudio_stream.stop_stream()
source.stream.pyaudio_stream.close()
break
but this lines close whole app((, not just function, where i used this code
Just wanted to say thank you for the cool repo!
I'm trying to run this app as you described
python transcriber.py
but I came across a few issues:
python cx_freeze_setup.py build
but I haven't dug into that too much yetAny support with debugging would be appreciated
Thanks for all that you've done!
getting this error when running the script
Model loaded.
Traceback (most recent call last):
File "C:\Users\ibrah\Desktop\demo.py", line 152, in <module>
main()
File "C:\Users\ibrah\Desktop\demo.py", line 124, in main
result = audio_model.transcribe(temp_file, fp16=torch.cuda.is_available())
File "C:\Users\ibrah\AppData\Local\Programs\Python\Python310\lib\site-packages\whisper\transcribe.py", line 121, in transcribe
mel = log_mel_spectrogram(audio, padding=N_SAMPLES)
File "C:\Users\ibrah\AppData\Local\Programs\Python\Python310\lib\site-packages\whisper\audio.py", line 130, in log_mel_spectrogram
audio = load_audio(audio)
File "C:\Users\ibrah\AppData\Local\Programs\Python\Python310\lib\site-packages\whisper\audio.py", line 46, in load_audio
ffmpeg.input(file, threads=0)
File "C:\Users\ibrah\AppData\Local\Programs\Python\Python310\lib\site-packages\ffmpeg\_run.py", line 313, in run
process = run_async(
File "C:\Users\ibrah\AppData\Local\Programs\Python\Python310\lib\site-packages\ffmpeg\_run.py", line 284, in run_async
return subprocess.Popen(
File "C:\Users\ibrah\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 971, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Users\ibrah\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 1456, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified```
Heyo, so i ran this on my 2023 m2 macbook and got some results. it uses the gpu but doesnt quite get it right.
what i said into the microphone was
"hi hows it going"
"whats up"
"what it do"
anywhere here is my report:
(whisper_real_time) cameron@M2 whisper_real_time % pip freeze
certifi==2022.12.7
charset-normalizer==3.0.1
ffmpeg-python==0.2.0
filelock==3.9.0
future==0.18.3
huggingface-hub==0.12.1
idna==3.4
more-itertools==9.0.0
mpmath==1.2.1
networkx==3.0rc1
numpy==1.24.2
openai-whisper @ git+https://github.com/openai/whisper.git@51c785f7c91b8c032a1fa79c0e8f862dea81b860
packaging==23.0
PyAudio==0.2.13
PyYAML==6.0
regex==2022.10.31
requests==2.28.2
SpeechRecognition==3.9.0
sympy==1.11.1
tokenizers==0.13.2
torch==2.0.0.dev20230121
torchaudio==2.0.0.dev20230223
tqdm==4.64.1
transformers==4.26.1
typing_extensions==4.5.0
urllib3==1.26.14
(whisper_real_time) cameron@M2 whisper_real_time % python transcribe_demo.py --model large --non_english
Model loaded.
/Users/cameron/.local/share/virtualenvs/whisper_real_time-Iw30K9az/lib/python3.9/site-packages/whisper/decoding.py:633: UserWarning: The operator 'aten::repeat_interleave.self_int' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)
audio_features = audio_features.repeat_interleave(self.n_group, dim=0)
Hi<|en|><|en|> Hi Hi Hi Hi Hi Hi Hi
Hi<|en|><|en|> Hi Hi Hi Hi Hi Hi Hi
What<|en|><|en|><|en|>
Hi<|en|><|en|> Hi Hi Hi Hi Hi Hi Hi
What<|en|><|en|><|en|>
What<|en|><|en|><|en|> What What
^C
Transcription:
Hi<|en|><|en|> Hi Hi Hi Hi Hi Hi Hi
What<|en|><|en|><|en|>
What<|en|><|en|><|en|> What What
Hey, so, I've tried to modify the script to select an specific microphone, but I'm just getting a lot of errors and I'm not able to make it work...
I've also checked for other issue, and I found that it got fixed for MacOs's user, but not for windows
Does anyone know how to select an microphone with a windows machine?
➜ whisper_real_time git:(master) python3.8 transcribe_demo.py
Could not import the PyAudio C module '_portaudio'.
Traceback (most recent call last):
File "/home/samuel/.local/lib/python3.8/site-packages/speech_recognition/init.py", line 120, in get_pyaudio
import pyaudio
File "/usr/lib/python3/dist-packages/pyaudio.py", line 116, in
import _portaudio as pa
ModuleNotFoundError: No module named '_portaudio'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "transcribe_demo.py", line 152, in
main()
File "transcribe_demo.py", line 58, in main
for index, name in enumerate(sr.Microphone.list_microphone_names()):
File "/home/samuel/.local/lib/python3.8/site-packages/speech_recognition/init.py", line 135, in list_microphone_names
audio = Microphone.get_pyaudio().PyAudio()
File "/home/samuel/.local/lib/python3.8/site-packages/speech_recognition/init.py", line 122, in get_pyaudio
raise AttributeError("Could not find PyAudio; check installation")
AttributeError: Could not find PyAudio; check installation
Hi there :)
Any idea on why im getting this?
python3.11 transcribe_demo.py
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 1348, in do_open
h.request(req.get_method(), req.selector, req.data, headers,
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/http/client.py", line 1303, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/http/client.py", line 1349, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/http/client.py", line 1298, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/http/client.py", line 1058, in _send_output
self.send(msg)
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/http/client.py", line 996, in send
self.connect()
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/http/client.py", line 1475, in connect
self.sock = self._context.wrap_socket(self.sock,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ssl.py", line 517, in wrap_socket
return self.sslsocket_class._create(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ssl.py", line 1104, in _create
self.do_handshake()
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ssl.py", line 1382, in do_handshake
self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1006)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/magic/wholesomegarden/magicllight/transparwnt-web-app/whisper_real_time/transcribe_demo.py", line 143, in <module>
main()
File "/Users/magic/wholesomegarden/magicllight/transparwnt-web-app/whisper_real_time/transcribe_demo.py", line 66, in main
audio_model = whisper.load_model(model)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/whisper/__init__.py", line 133, in load_model
checkpoint_file = _download(_MODELS[name], download_root, in_memory)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/whisper/__init__.py", line 69, in _download
with urllib.request.urlopen(url) as source, open(download_target, "wb") as output:
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 216, in urlopen
return opener.open(url, data, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 519, in open
response = self._open(req, data)
^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 536, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 496, in _call_chain
result = func(*args)
^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 1391, in https_open
return self.do_open(http.client.HTTPSConnection, req,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 1351, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1006)>
And maybe how to fix it...
trying to run on mac
Thanks
what would i need to add to the code to get the transcription to auto translate to English? i have another command line tool that uses whisper that accepts audio files and a bunch of different arguments for what whisper should do with it, including translation in the form of the "--task translate" argument. in the main transcribe_demo.py file i see where a couple arguments are being set for program, so i simply tried adding a similar line containing that argument, but i couldn't get it to work.
thoughts?
Python 2.7 did not work installing requirements in WIN10
Python 3.12 errors with
import setuptools.version
File "C:\Users\Administrator\AppData\Local\Temp\pip-build-env-zv_92dg2\overlay\Lib\site-packages\setuptools\version.py", line 1, in <module>
import pkg_resources
File "C:\Users\Administrator\AppData\Local\Temp\pip-build-env-zv_92dg2\overlay\Lib\site-packages\pkg_resources\__init__.py", line 2191, in <module>
register_finder(pkgutil.ImpImporter, find_on_path)
^^^^^^^^^^^^^^^^^^^
AttributeError: module 'pkgutil' has no attribute 'ImpImporter'. Did you mean: 'zipimporter'?
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.
Почему-то не могу никак понять, как именно и где прописать, чтобы whisper использовал мощности моей видеокарты, а лучше совмещать мог ее и процессор? Ибо какой толк использовать лишь его? Кто знает - поделитесь.
Error:
Traceback (most recent call last):
File "C:\Users\MSI\PycharmProjects\Jarvis\test_2.py", line 131, in
main()
File "C:\Users\MSI\PycharmProjects\Jarvis\test_2.py", line 103, in main
result = audio_model.transcribe(temp_file, fp16=torch.cuda.is_available())
File "C:\Users\MSI\PycharmProjects\Jarvis\venv\lib\site-packages\whisper\transcribe.py", line 121, in transcribe
mel = log_mel_spectrogram(audio, padding=N_SAMPLES)
File "C:\Users\MSI\PycharmProjects\Jarvis\venv\lib\site-packages\whisper\audio.py", line 130, in log_mel_spectrogram
audio = load_audio(audio)
File "C:\Users\MSI\PycharmProjects\Jarvis\venv\lib\site-packages\whisper\audio.py", line 46, in load_audio
ffmpeg.input(file, threads=0)
File "C:\Users\MSI\PycharmProjects\Jarvis\venv\lib\site-packages\ffmpeg_run.py", line 313, in run
process = run_async(
File "C:\Users\MSI\PycharmProjects\Jarvis\venv\lib\site-packages\ffmpeg_run.py", line 284, in run_async
return subprocess.Popen(
File "C:\Users\MSI\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 971, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Users\MSI\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 1440, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] Не удается найти указанный файл
Process finished with exit code 1
I get this error, I do not know what the reason is. The error comes out after the inscription "Model Loaded". So I think the error is in the second half of the code. I used the original code, just changed the model to a small one. Can you help me?
Hi, excellent work on this repo.
Any way to do multi-party? Ex: 2 people + talking
Is there a way to differentiate the speakers?
Is there documentation for how to use the demo?
For example, how would you adjust the size of the model, how do you know if it's working, what is the default sound device it is picking up audio from, and can it work on a basic Intel GPU when using the smaller options?
Users on Mac may receive an error where requirements.txt fails on the pyaudio install. To fix this, you need to first install portaudio through homebrew first. On their documentation, you need to run the commands in this order:
brew install portaudio
pip install pyaudio
Hat tip stackoverflow
For anyone facing this error, when they try to run the demo, after installing the requirements.
Check if torch installed.
You can do check by going into the python CLI and trying to import torch manually.
Not sure why the issue is occuring but a quick google search reveals the error is with torch2.3 so downloading an older version will help
You can use the official instructions in pytorch.org to generate the install, but that wil default to the latest version 2.3
Simply specify version 2.2.2 or lower and update the index from cu116 to cu118
Example you can use below: (This will also install other pytorch livbraries which can come in handy in the future.
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
1 of the google solutions found here
https://stackoverflow.com/questions/74594256/pytorch-error-loading-lib-site-packages-torch-lib-shm-dll-or-one-of-its-depen
when I try to run the demo program on linux, I get these errors.
python3 transcribe_demo.py --model tiny
ALSA lib pcm_dsnoop.c:566:(snd_pcm_dsnoop_open) unable to open slave
ALSA lib pcm_dmix.c:999:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm.c:2666:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2666:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2666:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp
ALSA lib pcm_a52.c:1001:(_snd_pcm_a52_open) a52 is only for playback
ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card
ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card'
ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card
ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card'
ALSA lib pcm_dmix.c:999:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm_dsnoop.c:566:(snd_pcm_dsnoop_open) unable to open slave
ALSA lib pcm_dmix.c:999:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm.c:2666:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2666:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2666:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp
ALSA lib pcm_a52.c:1001:(_snd_pcm_a52_open) a52 is only for playback
ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card
ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card'
ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card
ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card'
ALSA lib pcm_dmix.c:999:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm_dsnoop.c:566:(snd_pcm_dsnoop_open) unable to open slave
ALSA lib pcm_dmix.c:999:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm.c:2666:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2666:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2666:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp
ALSA lib pcm_a52.c:1001:(_snd_pcm_a52_open) a52 is only for playback
ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card
ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card'
ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card
ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card'
ALSA lib pcm_dmix.c:999:(snd_pcm_dmix_open) unable to open slave
Model loaded.
ALSA lib pcm_dsnoop.c:566:(snd_pcm_dsnoop_open) unable to open slave
ALSA lib pcm_dmix.c:999:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm.c:2666:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2666:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2666:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp
ALSA lib pcm_a52.c:1001:(_snd_pcm_a52_open) a52 is only for playback
ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card
ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card'
ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card
ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card'
ALSA lib pcm_dmix.c:999:(snd_pcm_dmix_open) unable to open slave
I would like to use computer's line-in to transcribe coming sound from computer. Is there any way to change source of sound from mic to line in through command prompt argument addition? Good project btw. Thanks
Хочу компилировать твой замечательный код, чтобы на выходе быть .exe файл со всеми зависимостями. Скажи как можно указать место, куда будет скачиваться модель whisper?
Script says Python 3.7, so I used 3.7 in my Conda env, but when I pip install requirements.txt, I get errors based on python 3.7. I believe the issue is with PyAudio:
Collecting SpeechRecognition
Using cached SpeechRecognition-3.8.1-py2.py3-none-any.whl (32.8 MB)
INFO: pip is looking at multiple versions of pyaudio to determine which version is compatible with other requirements. This could take a while.
Collecting pyaudio
Using cached PyAudio-0.2.12.tar.gz (42 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
ERROR: Ignored the following versions that require a different python version: 0.1.1 Requires-Python >=3.9; 0.1.2 Requires-Python >=3.8; 0.2.0 Requires-Python >=3.8; 0.3.0 Requires-Python >=3.8; 0.3.1 Requires-Python >=3.8; 0.3.2 Requires-Python >=3.8; 0.3.3 Requires-Python >=3.8; 3.10.0 Requires-Python >=3.8
ERROR: Could not find a version that satisfies the requirement tiktoken==0.3.1 (from openai-whisper) (from versions: none)
ERROR: No matching distribution found for tiktoken==0.3.1
Thus far I have installed and tried to run the code in the demo. I am not sure to run it so the terminal starts transcribing!
Thanks !
parameter logits (Tensor of shape (1, 51864)) of distribution Categorical(logits: torch.Size([1, 51864])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values: tensor([[nan, nan, nan, ..., nan, nan, nan]], device='cuda:0')
I have tried [this] (openai/whisper#1068) but it did not work.
I got !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 1.5mm 1.5mm 1.5mm 1.5mm 1.5mm 1.5mm 1.5mm 1.5mm 1.5mm 1.5mm 1.5mm 1.5mm 1.5mm 1.5mm
Hello.
Thanks to your whisper real-time, I tried STT on my computer.
I want to use this package on my Jetson Nano, but when I run it on my Jetson Nano, the CPU and memory usage is very high and the screen freezes.
Then someone told me to use the API of OPENAI, and just like running GPT with python code, I can use the API of WHISPER.
So I'm wondering if I can use the STT function in this code by entering the api key without downloading the model or running heavy.
How do I run this? The basics are missing from the readme.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.