mallorbc / whisper_mic Goto Github PK
View Code? Open in Web Editor NEWProject that allows one to use a microphone with OpenAI whisper.
License: MIT License
Project that allows one to use a microphone with OpenAI whisper.
License: MIT License
Great work, Blake. This has helped me no-end getting off of google docs voice typing.
Got one issue. Specifying --english
generates the following error:
Traceback (most recent call last):
File "/home/sunny/Developer/whisper_mic/mic.py", line 48, in <module>
main()
File "/home/sunny/.conda/envs/whisper/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/home/sunny/.conda/envs/whisper/lib/python3.9/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/home/sunny/.conda/envs/whisper/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/sunny/.conda/envs/whisper/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/home/sunny/Developer/whisper_mic/mic.py", line 42, in main
result = audio_model.transcribe(save_path)
File "/home/sunny/.conda/envs/whisper/lib/python3.9/site-packages/whisper/transcribe.py", line 82, in transcribe
_, probs = model.detect_language(segment)
File "/home/sunny/.conda/envs/whisper/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/sunny/.conda/envs/whisper/lib/python3.9/site-packages/whisper/decoding.py", line 35, in detect_language
raise ValueError(f"This model doesn't have language tokens so it can't perform lang id")
ValueError: This model doesn't have language tokens so it can't perform lang id
Works fine without this flag
Hi. I tried using mic following your video tutorial but got this error.
File "C:\whisper_mic\whisper\lib\site-packages\pydub\utils.py", line 274, in mediainfo_json res = Popen(command, stdin=stdin_parameter, stdout=PIPE, stderr=PIPE)
File "C:\Users\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 966, in _init_ self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Users\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 1435, in _execute_child hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified
I found this and went ahead to change the shell parameter in POpen init to True and ended up with another error.
File "C:\winni\AppData\Local\Programs\Python\Python310\lib\json\decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Users\AppData\Local\Programs\Python\Python310\lib\json\decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
It seems to me like no audio is captured. I also tried using a headset but it still doesn't work. Would you be able to help?
I'm using Windows 10 and Python 3.10.4
After trying to install with pip install -r requirements, it errored out saying that gcc wasn't installed. Following that, it failed to build the wheel for pyaudio until I installed portaudio19-dev.
I suggest adding those two dependencies to requirements.txt file.
Is there any command for Troubleshooting pic.py issue in windows
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_route.c:877:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device [/dev/dsp](https://file+.vscode-resource.vscode-cdn.net/dev/dsp)
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device [/dev/dsp](https://file+.vscode-resource.vscode-cdn.net/dev/dsp)
ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card
ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card'
ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card
ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card'
100%|████████████████████████████████████████| 139M/139M [00:01<00:00, 110MiB/s]
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_route.c:877:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device [/dev/dsp](https://file+.vscode-resource.vscode-cdn.net/dev/dsp)
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device [/dev/dsp](https://file+.vscode-resource.vscode-cdn.net/dev/dsp)
ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card
ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card'
ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card
ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card'
Running this on linux ubuntu
Hi, I just wanted to say Thanks for posting this sample app, I used it as part of a voice coding system (speech recognition for software developers) that I'm currently using to control my computer without a keyboard due to RSI issues: daanzu/kaldi-active-grammar#73
Its not working and showing this error while I tried running whisper_mic.py, pls gimme solution for it.
RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
You might want to add this line after with sr.Microphone as source
r.adjust_for_ambient_noise(source, duration = 1)
Wasn't recording for me till I added this.
Windows btw.
The current implementation of the listen() method has a mandatory "timeout" parameter whose default value is 3. It can be adjusted, but making the "timeout" parameter mandatory makes the method less flexible for the user to use. The user every time has to calculate the time it would require him to say the query and pass the value to the "timeout" parameter before using it.
It would make more sense if the listen() method had the ability to auto-detect the duration for which a user gives voice input. It can have the "timeout" parameter as optional.
After using the whisper_mic package for my project, I noticed the program starts to capture the audio input just after creating the WhisperMic object. After exploring the code, I found the init() method calls the setup_mic() method, which immediately starts recording after setting up the mic properties.
I think the setup_mic() method should only handle the task of setting the mic properties. The listening for audio input should be initiated once the user calls for the listen() or listen_loop() method.
While using the listen_loop() method with "dictate = False", which will print the transcriptions in the output console, I noticed the output had many line-breaks even if I didn't pause for a second while giving the voice input.
Hello friend! Thank you for your great initiative with this repository, it's very useful!
However, I noticed that because the call to the model is a blocking call, the voice input during the time the AI model is working is lost.
In a private repo I fixed this with threading and queues, would you be interested in a Pull Request for this repo as well?
please add biggest model large-v3
Hello, firstly, thank you for the development!
This could be a hardware issue rather than code, but I would like to ask you a question as no matter how many solutions I have tried, it does not resolve the issue.
Sometimes the voice is not recognized for about 2 hours.
In cli.py, line 51, audio = r.listen(source)
in record_audio()
does not work.
Macbook M2 Air, (I tried python3.9, 3.10, 3.11)
After the first smooth use of whisper-mic, it stopped working about 24 hours later.
For that, the following method works occasionally.
pip install whisper-mic
againThe following methods did not work.
simply restart
shut down macbook and wait for 1mins , then start (for pram things)
reduce noise reduction (r.adjust_for_ambient_noise(source,duration=1)
)
Specify device_index, with sr.Microphone(device_index = None / also in 3 after checking)
For Speech recognition, python3 -m speech_recognition
used in sr part doesn't work either
Create an environment with venv and try again -> it stops with "say something...".
Create an environment with pipenv and try again -> stops with "say something..." * Create an environment with docker and try again -> stops with "say something...
Create an environment with docker and try again
I've been stuck in this problem for 3 weeks.
I appreciate your any help!!
Hello!
How does the new version of your program handle stopping the live transcription? I haven't been able to find where the terminating key stroke is defined, apologies if I have overlooked this. I have been killing the process manually in my cli.
Thank you
Running on windows via miniconda3. I am using the large model and it is not touching my VRAM, just spiking my CPU to about 50%.
Not seeing any flags to specify CPU vs GPU via help and not seeing anywhere in the code that specifies it.
For my mic configuration it seems that the default 16k sample rate is invalid. Needed to increase it to 48k in mic.py:
with sr.Microphone(sample_rate=48000) as source:
That solved that issue. Hopefully it helps others.
[02/27/24 18:25:02] INFO No mic index provided, using whisper_mic.py:84
default
Traceback (most recent call last):
File "D:\programing\Python\LearnAI\mic_to_text.py", line 3, in
mic = WhisperMic()
^^^^^^^^^^^^
File "D:\programing\Python\LearnAI.venv\Lib\site-packages\whisper_mic\whisper_mic.py", line 79, in init
self.__setup_mic(mic_index)
File "D:\programing\Python\LearnAI.venv\Lib\site-packages\whisper_mic\whisper_mic.py", line 85, in setup_mic
self.source = sr.Microphone(sample_rate=16000, device_index=mic_index)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\programing\Python\LearnAI.venv\Lib\site-packages\speech_recognition_init.py", line 80, in init
self.pyaudio_module = self.get_pyaudio()
^^^^^^^^^^^^^^^^^^
File "D:\programing\Python\LearnAI.venv\Lib\site-packages\speech_recognition_init.py", line 111, in get_pyaudio
from distutils.version import LooseVersion
ModuleNotFoundError: No module named 'distutils'
[EDIT] fixed with
pip install setuptools
Hello,
It may be useful to have an option to save the transcribed speech to a text file.
Like all of the text that is printed to the console is appended to a text file (or other format).
Are there any plans to implement a feature similar to this?
Thank you
Hi, thanks for the amazing repository. It worked amazing. But is there a way to reduce latency? I am using on base model.
Or is it possible to infer on a downloaded .h5 model?
Thanks
Can the code be accessible from a web based tool?
Our students would love that!
to record from a web page and have their pronunciation transcribed and rated.
Thanls
Running the followig code results in the process getting stuck when not speaking loud enough:
mic = WhisperMic()
result = mic.record(duration=2)
print(result)
The issue seems to come from the function __transcribe, where self.result_queue is filled only when is_audio_loud_enough is true:
if is_audio_loud_enough:
# faster_whisper returns an iterable object rather than a string
if self.faster:
segments, info = self.audio_model.transcribe(audio_data)
predicted_text = ''
for segment in segments:
predicted_text += segment.text
else:
if self.english:
result = self.audio_model.transcribe(audio_data,language='english',suppress_tokens="")
else:
result = self.audio_model.transcribe(audio_data,suppress_tokens="")
predicted_text = result["text"]
if not self.verbose:
if predicted_text not in self.banned_results:
self.result_queue.put_nowait(predicted_text)
else:
if predicted_text not in self.banned_results:
self.result_queue.put_nowait(result)
As a result, the functions listen and record get stuck in the following loop, because self.result_queue is always empty:
while True:
if not self.result_queue.empty():
return self.result_queue.get()
pip install whisper_mic not working
Environment : MacOS
Python version 3.8.17
pip version 23.2.1
For windows 11, it seems "ctrl+c" doesn't stop the process completely.
[Reproduct process]
m
## TODO: Export logic to a separate file. This could allow this to be a pip package.
Hello,
I'm trying the code on my linux laptop (ubuntu-based, Linux mint 20). The installation went well but when arriving to "Say something" there are some error messages related to the microphone detection:
ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_route.c:869:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:869:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:869:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:869:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_oss.c:377:(_snd_pcm_oss_open) Unknown field port
ALSA lib pcm_oss.c:377:(_snd_pcm_oss_open) Unknown field port
ALSA lib pcm_usb_stream.c:486:(_snd_pcm_usb_stream_open) Invalid type for card
ALSA lib pcm_usb_stream.c:486:(_snd_pcm_usb_stream_open) Invalid type for card
I've installed the additional portions as mentioned on the README but still no dice.
How would we get this sorted?
Thanks!
This works really well for transcribing standalone sentences. However, if you read from a book, for example, and don't leave a substantial amount of silent space after finishing a sentence, it won't process the results of what you say until after you finish speaking and will then print a massive block of text.
Is there something that can be done to push whisper to output smaller blocks of text, or to slice the audio going into the transcriber, in order to encourage it to stream out smaller chunks of text (at least at approximately the sentence level, since word-level might be asking too much)? I notice the GPU idling until I finish talking, so I don't think it's doing most of the transcription work until that point, and I'd like to better utilize my GPU as I speak.
Hi Blake,
Here is my environment:
MacOS Sonoma 14.4.1
XCode Command Line Tools 15.3.0.0.1.1708646388
Python 3.12.2
portaudio 19.7.0 (installed with brew)
whisper_mic 1.4.2
Here is the output I get when I run the application:
% ~/opt/dictee/bin/whisper_mic --model medium
[04/03/24 17:45:47] INFO No mic index provided, using default whisper_mic.py:84
Traceback (most recent call last):
File "/Users/pro/opt/dictee/bin/whisper_mic", line 8, in <module>
sys.exit(main())
^^^^^^
File "/Users/pro/opt/dictee/lib/python3.12/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pro/opt/dictee/lib/python3.12/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/Users/pro/opt/dictee/lib/python3.12/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pro/opt/dictee/lib/python3.12/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pro/opt/dictee/lib/python3.12/site-packages/whisper_mic/cli.py", line 29, in main
mic = WhisperMic(model=model, english=english, verbose=verbose, energy=energy, pause=pause, dynamic_energy=dynamic_energy, save_file=save_file, device=device,mic_index=mic_index,implementation=("faster_whisper" if faster else "whisper"),hallucinate_threshold=hallucinate_threshold)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pro/opt/dictee/lib/python3.12/site-packages/whisper_mic/whisper_mic.py", line 79, in __init__
self.__setup_mic(mic_index)
File "/Users/pro/opt/dictee/lib/python3.12/site-packages/whisper_mic/whisper_mic.py", line 85, in __setup_mic
self.source = sr.Microphone(sample_rate=16000, device_index=mic_index)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pro/opt/dictee/lib/python3.12/site-packages/speech_recognition/__init__.py", line 80, in __init__
self.pyaudio_module = self.get_pyaudio()
^^^^^^^^^^^^^^^^^^
File "/Users/pro/opt/dictee/lib/python3.12/site-packages/speech_recognition/__init__.py", line 111, in get_pyaudio
from distutils.version import LooseVersion
ModuleNotFoundError: No module named 'distutils'
Maybe it is just that your code is not yet ready for Python 3.12?
Thanks a lot for providing us with whisper-mic.
I made transcribe-anything
which will install the cuda version of whisper when it detects that you are using nvida-smi.
I wanted to point this out. The motivation was installing whisper without messing up the users global install of pytorch packages.
@mallorbc
Hello,
I am using the --save_file flag and I see that it saves the output to a temporary directory but I am not able to find the file or the directory.
How do you access your saved file when you are using this feature?
Thank you
It takes about 8-10 seconds for the program to print "Mic setup". Just wondering why there is a delay. Is it because of the whisper model or because of the mic setup itself? Is there anyway to improve the load time, if done, can enhance the project greatly. I am willing to look into it, so any leads are welcomed.
I find that whisper is not that accurate for my English. I wish to have the capability to save the audio so that I can play back and correct the transcription. (This is the practice of otter.ai). I don't really need a real-time transcription. It's fine for me to record it, and get the transcription later, hopefully with the audio recording of the original.
I wish it would be even cooler to use my correction to improve whisper for my own situation. I'm not sure if it is possible currently.
Could you please consider using a less restrictive license, such as MIT? I mean, the whisper repo itself is MIT licensed. As it is, I'm not willing to reuse your code. I would have to rewrite it, which frankly is trivial these days using GPT.
Can you offer MIT as an alternative license please, so that I can use your code directly and give you credit for it?
Thank you for writing this package. Very useful!
I am running this on my Macbook and using the built-in mic works fine, but the quality is not that great.
I purchased a USB C conference mic which I can see in System Preferences > Input but it seems the script still wants to use the built-in mic.
Is it possible to override the mic that is used and point it towards my USB mic?
I've found this script to be amazingly effective; it gives good real-time performance and accuracy, although I do wish the latency was a little lower (or that it could simply spit out smaller chunks of text more continuously).
It seems like faster-whisper is able to process the same audio much faster, and it might give even better real-time low-latency results than using stock whisper. I've been toying around with the code, but I'm very novice and clearly CTranslate2 uses different systems and parameters, so it keeps throwing errors when I try to point at it instead of whisper. Would you consider including support for faster-whisper to benefit from the massive performance improvements and lower memory usage?
Got some shenanigans with Alsa happening. I'm on pop os latest update, with a mic plugged in through a Focusrite interface.
ALSA lib pcm_dsnoop.c:566:(snd_pcm_dsnoop_open) unable to open slave
ALSA lib pcm_dmix.c:999:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm.c:2666:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2666:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2666:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp
ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card
ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card'
ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card
ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card'
ALSA lib pcm_dmix.c:999:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm_dsnoop.c:566:(snd_pcm_dsnoop_open) unable to open slave
ALSA lib pcm_dmix.c:999:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm.c:2666:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2666:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2666:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp
ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card
ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card'
ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card
ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card'
ALSA lib pcm_dmix.c:999:(snd_pcm_dmix_open) unable to open slave
Which leads to a NoneType error
Traceback (most recent call last):
File "/home/raskoll/.local/lib/python3.10/site-packages/pynput/keyboard/_xorg.py", line 209, in __del__
File "/usr/lib/python3/dist-packages/Xlib/display.py", line 161, in close
File "/usr/lib/python3/dist-packages/Xlib/protocol/display.py", line 258, in close
File "/usr/lib/python3/dist-packages/Xlib/protocol/display.py", line 255, in flush
File "/usr/lib/python3/dist-packages/Xlib/protocol/display.py", line 565, in send_and_recv
AttributeError: 'NoneType' object has no attribute 'error'
My mic works in other apps so idrk what the problem is.
Also under __transcribe:
if self.english:
result = self.audio_model.transcribe(audio_data,language='english',suppress_tokens="")
predicted_text = result["text"] #This was missing
I tried running python whisper_mic.py --help
on the CLI and was met with an error.
C:\Users\[___]\anaconda3\envs\whisper\lib\site-packages\pydub\utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
Traceback (most recent call last):
File "C:\Users\[___]\pythonprojects\Amadeus\whisper_mic\whisper_mic\whisper_mic.py", line 14, in <module>
from whisper_mic.utils import get_logger
File "C:\Users\[___]\pythonprojects\Amadeus\whisper_mic\whisper_mic\whisper_mic.py", line 14, in <module>
from whisper_mic.utils import get_logger
ModuleNotFoundError: No module named 'whisper_mic.utils'; 'whisper_mic' is not a package
Thanks for the Cool Repo. But I found when there is no audio the model gives junk transcription.
Any suggestions to improve this here?
Hey!Thanks a lot for the implementation!Its simple and super useful. Can you explain the following conversion:
torch_audio = torch.from_numpy(np.frombuffer(audio.get_raw_data(), np.int16).flatten().astype(np.float32) / 32768.0)
Why do we need to convert it to float 32?
I've been trying to get this to recognise speech from the audio output of my headphone speakers, I found the correct device index and set it but it gives me this error whenever i try using a speaker instead of a microphone
line 465, in listen
assert source.stream is not None, "Audio source must be entered before listening, see documentation for ``AudioSource``; are you using ``source`` outside of a ``with`` statement?"
AssertionError: Audio source must be entered before listening, see documentation for ``AudioSource``; are you using ``source`` outside of a ``with`` statement?
here is a snippet of the code:
with sr.Microphone(sample_rate=16000, device_index=6) as source:
print("Say something")
i = 0
while True:
audio = r.listen(source)
I didn't change anything big aside from adding comments and setting a device index for my speakers. the error points to the last line
audio = r.listen(source)
First off, I want to say thank you for making this. It's been a lifesaver so far.
Second, I'm very new to this kind of project and python in general, so I apologize if this question is obvious or nonsensical. The CLI commands are great, but I'm trying to do the same setup in python (specifying the device, the model, the mic, etc.). I know that the init function sets everything to a default value, but I was wondering if there was a way to set these qualities manually in a separate python file so that any user can download my code and have it work with your whisper_mic.py file out of the box. I also wondered about how to find the mic index that I need and how to set the FP16/FP32/INT8 options. I keep getting a warning that FP16 isn't supported on my cpu, which causes it to default to FP32. I'd like to set it to FP32 from the start. If I have to modify the whisper_mic.py file itself, I understand, but I just wanted to make sure there wasn't any other way.
hello ,
i have run everthing according your tutorial but when i run the code :
whisper-mic --model small -- loop
nothing transcript from my voice .i use the built-in mic of laptop
If possible provide examples - same as for CLI
like device number etc to setup in mic = WhisperMic()
like: mic = WhisperMic(mic_index=2)
I'm not sure how to fix this, but the issue lies with this line:
audio_clip = AudioSegment.from_file(data)
used this code:
print("test1")
audio = r.listen(source)
print("test2")
data = io.BytesIO(audio.get_wav_data())
print(data)
print("\n\n\n")
audio_clip = AudioSegment.from_file(data)
print("test4")
got this printout (excluding the error):
Say something!
test1
test2
<_io.BytesIO object at 0x000002889399A0C0>
not sure if this is related:
RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
I've installed ffmpeg, not sure why I'm getting that
Error:
File "C:\Users\Dhruv\Desktop\py\AudioProcessorWhisper\whisper_mic_test.py", line 5, in <module>
result = mic.listen()
File "C:\Users\Dhruv\AppData\Local\Programs\Python\Python310\lib\site-packages\whisper_mic\whisper_mic.py", line 215, in listen
self.__listen_handler(timeout, phrase_time_limit)
File "C:\Users\Dhruv\AppData\Local\Programs\Python\Python310\lib\site-packages\whisper_mic\whisper_mic.py", line 132, in __listen_handler
self.__transcribe(data=audio_data)
File "C:\Users\Dhruv\AppData\Local\Programs\Python\Python310\lib\site-packages\whisper_mic\whisper_mic.py", line 184, in __transcribe
if predicted_text not in self.banned_results:
UnboundLocalError: local variable 'predicted_text' referenced before assignment
Code:
from whisper_mic import WhisperMic
mic = WhisperMic( mic_index=2, save_file=True, english=True )
result = mic.listen()
print(result)
I solved it with this:
def __transcribe(self,data=None, realtime: bool = False) -> None:
if data is None:
audio_data = self.__get_all_audio()
else:
audio_data = data
audio_data,is_audio_loud_enough = self.__preprocess(audio_data)
if is_audio_loud_enough:
# faster_whisper returns an iterable object rather than a string
predicted_text = '' # I MOVED THIS HERE, ON TOP <-------
if self.faster:
segments, info = self.audio_model.transcribe(audio_data)
for segment in segments:
predicted_text += segment.text
else:
if self.english:
result = self.audio_model.transcribe(audio_data,language='english',suppress_tokens="") # This is what i need but it doesnt return?
else:
result = self.audio_model.transcribe(audio_data,suppress_tokens="")
predicted_text = result["text"]
if not self.verbose:
if predicted_text not in self.banned_results:
self.result_queue.put_nowait(predicted_text)
else:
if predicted_text not in self.banned_results:
self.result_queue.put_nowait(result)
print('predicted_text ' + predicted_text)
if self.save_file:
# os.remove(audio_data)
self.file.write(predicted_text)
I decided to modify the code to actually return something and put something in the predicted text.
Ubuntu 24.04, conda env with python 3.11.9,
Launching from terminal, it starts hearing me, but and after using it for few seconds it crushes (even if I'm using only English words while speaking):
$ whisper_mic --loop --dictate --model=tiny
[05/28/24 23:54:51] INFO No mic index provided, using default whisper_mic.py:84
ALSA lib pcm_dsnoop.c:567:(snd_pcm_dsnoop_open) unable to open slave
ALSA lib pcm_dmix.c:1000:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm.c:2721:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2721:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2721:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_dmix.c:1000:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm_dsnoop.c:567:(snd_pcm_dsnoop_open) unable to open slave
ALSA lib pcm_dmix.c:1000:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm.c:2721:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2721:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2721:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_dmix.c:1000:(snd_pcm_dmix_open) unable to open slave
[05/28/24 23:54:52] INFO Mic setup complete whisper_mic.py:95
INFO Listening... whisper_mic.py:213
ALSA lib pcm_dsnoop.c:567:(snd_pcm_dsnoop_open) unable to open slave
ALSA lib pcm_dmix.c:1000:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm.c:2721:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2721:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2721:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_dmix.c:1000:(snd_pcm_dmix_open) unable to open slave
Can we hear? See you next time! Okay See you next time [] What? I haven't said it. [BLANK_AUDIO] routine What a strange his food. -maya Traceback (most recent call last):
File "/home/pupadupa/anaconda3/envs/maya/lib/python3.11/site-packages/pynput/keyboard/_base.py", line 492, in type
self.release(key)
File "/home/pupadupa/anaconda3/envs/maya/lib/python3.11/site-packages/pynput/keyboard/_base.py", line 427, in release
self._handle(resolved, False)
File "/home/pupadupa/anaconda3/envs/maya/lib/python3.11/site-packages/pynput/keyboard/_xorg.py", line 235, in _handle
raise self.InvalidKeyException(key)
pynput.keyboard._base.Controller.InvalidKeyException: '겸'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/pupadupa/anaconda3/envs/maya/bin/whisper_mic", line 8, in <module>
sys.exit(main())
^^^^^^
File "/home/pupadupa/anaconda3/envs/maya/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/pupadupa/anaconda3/envs/maya/lib/python3.11/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/home/pupadupa/anaconda3/envs/maya/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/pupadupa/anaconda3/envs/maya/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/pupadupa/anaconda3/envs/maya/lib/python3.11/site-packages/whisper_mic/cli.py", line 42, in main
mic.listen_loop(dictate=dictate,phrase_time_limit=2)
File "/home/pupadupa/anaconda3/envs/maya/lib/python3.11/site-packages/whisper_mic/whisper_mic.py", line 206, in listen_loop
self.keyboard.type(result)
File "/home/pupadupa/anaconda3/envs/maya/lib/python3.11/site-packages/pynput/keyboard/_base.py", line 495, in type
raise self.InvalidCharacterException(i, character)
pynput.keyboard._base.Controller.InvalidCharacterException: (7, '겸')
please add large-v3 :)
if its listening for example result = mic.listen(timout=6)
make it so it automaticly stops after a silence_timeout
(deflaut 1.5s)
Not sure if it is a system issue. @mallorbc I do not see the code to save the transcribed file anywhere. Am I missing something?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.