mallorbc / whisper_mic Goto Github PK

Project that allows one to use a microphone with OpenAI whisper.

License: MIT License

Python 100.00%

microphone speech-recognition speech-to-text whisper whisper-ai whisper-api

whisper_mic's Introduction

Whisper Mic

This repo is based on the work done here by OpenAI. This repo allows you use use a mic as demo. This repo copies some of the README from the original project.

Video Tutorial

The latest video tutorial for this repo can be seen here

An older video tutorial for this repo can be seen here

Professional Assistance

If are in need of paid professional help, that is available through this email

Setup

Now a pip package!

Create a venv of your choice.
Run pip install whisper-mic

Available models and languages

There are five model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below are the names of the available models and their approximate memory requirements and relative speed.

Size	Parameters	English-only model	Multilingual model	Required VRAM	Relative speed
tiny	39 M	`tiny.en`	`tiny`	~1 GB	~32x
base	74 M	`base.en`	`base`	~1 GB	~16x
small	244 M	`small.en`	`small`	~2 GB	~6x
medium	769 M	`medium.en`	`medium`	~5 GB	~2x
large	1550 M	N/A	`large`	~10 GB	1x

For English-only applications, the .en models tend to perform better, especially for the tiny.en and base.en models. We observed that the difference becomes less significant for the small.en and medium.en models.

Microphone Demo

You can use the model with a microphone using the whisper_mic program. Use -h to see flag options.

Some of the more important flags are the --model and --english flags.

Transcribing To A File

Using the command: whisper_mic --loop --dictate will type the words you say on your active cursor.

Usage In Other Projects

You can use this code in other projects rather than just use it for a demo. You can do this with the listen method.

from whisper_mic import WhisperMic

mic = WhisperMic()
result = mic.listen()
print(result)

Check out what the possible arguments are by looking at the cli.py file

Troubleshooting

If you are having issues, try the following:

sudo apt install portaudio19-dev python3-pyaudio

Contributing

Some ideas that you can add are:

Supporting different implementations of Whisper
Adding additional optional functionality.
Add tests

License

The model weights of Whisper are released under the MIT License. See their repo for more information.

This code under this repo is under the MIT license. See LICENSE for further details.

Thanks

Until recently, access to high performing speech to text models was only available through paid serviecs. With this release, I am excited for the many applications that will come.

whisper_mic's People

Contributors

Stargazers

Watchers

Forkers

squishypone asears agarwalprashant tonymajordev awexander lucasleandro1204 tailagency eight-ai pcannon67 spyrchat iamitaliya mattyq akasa3 mattisstenejohansen pepejah aaronwebster manitajaddini ireneshlee814 fxskycode baocin tatellos justmg devenbl ubeydemavus varinliali sunny635 sidsalil danielenapo davidpatterson-cole dosycorps faniry6 jaypersanchez futurizerush lilly-pad naveedjanmo effortprogrammer oskarniemenojatt sergtaima nearmem ace6942 jackrobert94464 timyiscool iserranoej godblessedsakura sariohara tttdevelop ausboss vranimeted vdeeplearn nekvinder derpymissile riccitensor kmizu joelmmm yash-clear mojoreign gunsurvivor leonardoguevara yeshuawb3 far5man sweetie-bot-project hicham1007 0000duck asmedeus998 saxoji iamkjlum tkhalim14 wiwomu edihasaj matthewrobertmac horu2day timber8205 lukeewin djengineer skimcloud cafew arr-n-d jamiemair sanwal-1852 achbogga azure-arc-0 jraymond1 darknio88 techthiyanes kangwoo skoky mastcharub arditecht r-barnes vital121 prahs zeocio sankhadeepdutta mmec69 rajeshb0202 vinayreddy100 bhanna-ux 5l1v3r1 hrvali shubh587

whisper_mic's Issues

Error when attempting to use speaker audio

I've been trying to get this to recognise speech from the audio output of my headphone speakers, I found the correct device index and set it but it gives me this error whenever i try using a speaker instead of a microphone

line 465, in listen
    assert source.stream is not None, "Audio source must be entered before listening, see documentation for ``AudioSource``; are you using ``source`` outside of a ``with`` statement?"
AssertionError: Audio source must be entered before listening, see documentation for ``AudioSource``; are you using ``source`` outside of a ``with`` statement?

here is a snippet of the code:

with sr.Microphone(sample_rate=16000, device_index=6) as source:
        print("Say something")
        i = 0
        while True:
            audio = r.listen(source)

I didn't change anything big aside from adding comments and setting a device index for my speakers. the error points to the last line
audio = r.listen(source)

Mic not automatically recognized on laptop

Hello,

I'm trying the code on my linux laptop (ubuntu-based, Linux mint 20). The installation went well but when arriving to "Say something" there are some error messages related to the microphone detection:

ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2642:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_route.c:869:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:869:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:869:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_route.c:869:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_oss.c:377:(_snd_pcm_oss_open) Unknown field port
ALSA lib pcm_oss.c:377:(_snd_pcm_oss_open) Unknown field port
ALSA lib pcm_usb_stream.c:486:(_snd_pcm_usb_stream_open) Invalid type for card
ALSA lib pcm_usb_stream.c:486:(_snd_pcm_usb_stream_open) Invalid type for card

I've installed the additional portions as mentioned on the README but still no dice.

How would we get this sorted?

Thanks!

ALSA lib error, invalid card

ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_route.c:877:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device [/dev/dsp](https://file+.vscode-resource.vscode-cdn.net/dev/dsp)
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device [/dev/dsp](https://file+.vscode-resource.vscode-cdn.net/dev/dsp)
ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card
ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card'
ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card
ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card'
100%|████████████████████████████████████████| 139M/139M [00:01<00:00, 110MiB/s]
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_route.c:877:(find_matching_chmap) Found no matching channel map
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device [/dev/dsp](https://file+.vscode-resource.vscode-cdn.net/dev/dsp)
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device [/dev/dsp](https://file+.vscode-resource.vscode-cdn.net/dev/dsp)
ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card
ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card'
ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card
ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card'

Running this on linux ubuntu

cannot import the function to another project

i success to import whisper_mic
but whisper_mic.whisper_mic Cloudnet find as attached

[Fix] Keyboard interrupt for listen_loop

For windows 11, it seems "ctrl+c" doesn't stop the process completely.

[Reproduct process]

pip install of whisper_mic
"whisper_mic --loop"
wait for the initialization
"Ctrl+c"
"Aborted!" shows in the command line. but the command line doesn't start to accept new commands.
(chatgpt says "Aborted!" is related to the operating system or the C runtime library btw.)

Takes considerable time to actually setup the mic and start transcribing

It takes about 8-10 seconds for the program to print "Mic setup". Just wondering why there is a delay. Is it because of the whisper model or because of the mic setup itself? Is there anyway to improve the load time, if done, can enhance the project greatly. I am willing to look into it, so any leads are welcomed.

Feature requests

a flag to disable the audio descriptions like [TYPING] [CLOCK TICKING] etc
a flag to issue a READY message when the model is loaded and ready to listen
is spoken punctuation a tricky one? i suppose it would be like watch words. it would be very useful

Adjusting for ambient Noise missing.

You might want to add this line after with sr.Microphone as source

r.adjust_for_ambient_noise(source, duration = 1)

Wasn't recording for me till I added this.
Windows btw.

Thanks for this project

Hi, I just wanted to say Thanks for posting this sample app, I used it as part of a voice coding system (speech recognition for software developers) that I'm currently using to control my computer without a keyboard due to RSI issues: daanzu/kaldi-active-grammar#73

Error while using the save_file to save the transcribed data

Not sure if it is a system issue. @mallorbc I do not see the code to save the transcribed file anywhere. Am I missing something?

mic.py does not exist in file or directory

Can anyone help with this?

[Speech not recognized] it stops at `audio = r.listen(source)`

Hello, firstly, thank you for the development!

This could be a hardware issue rather than code, but I would like to ask you a question as no matter how many solutions I have tried, it does not resolve the issue.

Issue

Sometimes the voice is not recognized for about 2 hours.

Where the problem occurs

In cli.py, line 51, audio = r.listen(source) in record_audio() does not work.

Environment

Macbook M2 Air, (I tried python3.9, 3.10, 3.11)

Steps so far

After the first smooth use of whisper-mic, it stopped working about 24 hours later.
For that, the following method works occasionally.

pip install whisper-mic again
turn off permission for microphone once from control panel on macbook and turn it on and restart
However, even if it works occasionally with these methods, it stops after about 2 hours.

The following methods did not work.

simply restart
shut down macbook and wait for 1mins , then start (for pram things)
reduce noise reduction (r.adjust_for_ambient_noise(source,duration=1))
Specify device_index, with sr.Microphone(device_index = None / also in 3 after checking)
For Speech recognition, python3 -m speech_recognition used in sr part doesn't work either
Create an environment with venv and try again -> it stops with "say something...".
Create an environment with pipenv and try again -> stops with "say something..." * Create an environment with docker and try again -> stops with "say something...
Create an environment with docker and try again

I've been stuck in this problem for 3 weeks.
I appreciate your any help！!

ModuleNotFoundError when trying to use whisper_mic.py

I tried running python whisper_mic.py --help on the CLI and was met with an error.

C:\Users\[___]\anaconda3\envs\whisper\lib\site-packages\pydub\utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
  warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
Traceback (most recent call last):
  File "C:\Users\[___]\pythonprojects\Amadeus\whisper_mic\whisper_mic\whisper_mic.py", line 14, in <module>
    from whisper_mic.utils import get_logger
  File "C:\Users\[___]\pythonprojects\Amadeus\whisper_mic\whisper_mic\whisper_mic.py", line 14, in <module>
    from whisper_mic.utils import get_logger
ModuleNotFoundError: No module named 'whisper_mic.utils'; 'whisper_mic' is not a package

Sonoma 14.4.1 - Python 3.12 - Running whisper_mic returns errors

Hi Blake,
Here is my environment:

MacOS Sonoma 14.4.1
XCode Command Line Tools 15.3.0.0.1.1708646388
Python 3.12.2
portaudio 19.7.0 (installed with brew)
whisper_mic 1.4.2

Here is the output I get when I run the application:

% ~/opt/dictee/bin/whisper_mic --model medium                 
[04/03/24 17:45:47] INFO     No mic index provided, using default                                                                                             whisper_mic.py:84
Traceback (most recent call last):
  File "/Users/pro/opt/dictee/bin/whisper_mic", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/pro/opt/dictee/lib/python3.12/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pro/opt/dictee/lib/python3.12/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/Users/pro/opt/dictee/lib/python3.12/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pro/opt/dictee/lib/python3.12/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pro/opt/dictee/lib/python3.12/site-packages/whisper_mic/cli.py", line 29, in main
    mic = WhisperMic(model=model, english=english, verbose=verbose, energy=energy, pause=pause, dynamic_energy=dynamic_energy, save_file=save_file, device=device,mic_index=mic_index,implementation=("faster_whisper" if faster else "whisper"),hallucinate_threshold=hallucinate_threshold)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pro/opt/dictee/lib/python3.12/site-packages/whisper_mic/whisper_mic.py", line 79, in __init__
    self.__setup_mic(mic_index)
  File "/Users/pro/opt/dictee/lib/python3.12/site-packages/whisper_mic/whisper_mic.py", line 85, in __setup_mic
    self.source = sr.Microphone(sample_rate=16000, device_index=mic_index)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pro/opt/dictee/lib/python3.12/site-packages/speech_recognition/__init__.py", line 80, in __init__
    self.pyaudio_module = self.get_pyaudio()
                          ^^^^^^^^^^^^^^^^^^
  File "/Users/pro/opt/dictee/lib/python3.12/site-packages/speech_recognition/__init__.py", line 111, in get_pyaudio
    from distutils.version import LooseVersion
ModuleNotFoundError: No module named 'distutils'

Maybe it is just that your code is not yet ready for Python 3.12?
Thanks a lot for providing us with whisper-mic.

pip install whisper_mic not working

Environment : MacOS
Python version 3.8.17
pip version 23.2.1

Save file location

@mallorbc
Hello,

I am using the --save_file flag and I see that it saves the output to a temporary directory but I am not able to find the file or the directory.
How do you access your saved file when you are using this feature?

Thank you

Issues with Python Setup

First off, I want to say thank you for making this. It's been a lifesaver so far.

Second, I'm very new to this kind of project and python in general, so I apologize if this question is obvious or nonsensical. The CLI commands are great, but I'm trying to do the same setup in python (specifying the device, the model, the mic, etc.). I know that the init function sets everything to a default value, but I was wondering if there was a way to set these qualities manually in a separate python file so that any user can download my code and have it work with your whisper_mic.py file out of the box. I also wondered about how to find the mic index that I need and how to set the FP16/FP32/INT8 options. I keep getting a warning that FP16 isn't supported on my cpu, which causes it to default to FP32. I'd like to set it to FP32 from the start. If I have to modify the whisper_mic.py file itself, I understand, but I just wanted to make sure there wasn't any other way.

Massive blocks of text – can this stream out smaller/consistent chunks?

This works really well for transcribing standalone sentences. However, if you read from a book, for example, and don't leave a substantial amount of silent space after finishing a sentence, it won't process the results of what you say until after you finish speaking and will then print a massive block of text.

Is there something that can be done to push whisper to output smaller blocks of text, or to slice the audio going into the transcriber, in order to encourage it to stream out smaller chunks of text (at least at approximately the sentence level, since word-level might be asking too much)? I notice the GPU idling until I finish talking, so I don't think it's doing most of the transcription work until that point, and I'd like to better utilize my GPU as I speak.

[Feature request][Bug] Improvement needed on the listen() method. Issues related to setup_mic() and listen_loop() method.

Proposal related to the listen() method

The current implementation of the listen() method has a mandatory "timeout" parameter whose default value is 3. It can be adjusted, but making the "timeout" parameter mandatory makes the method less flexible for the user to use. The user every time has to calculate the time it would require him to say the query and pass the value to the "timeout" parameter before using it.

Solution

It would make more sense if the listen() method had the ability to auto-detect the duration for which a user gives voice input. It can have the "timeout" parameter as optional.

Issue related to the setup_mic() method

After using the whisper_mic package for my project, I noticed the program starts to capture the audio input just after creating the WhisperMic object. After exploring the code, I found the init() method calls the setup_mic() method, which immediately starts recording after setting up the mic properties.

Solution

I think the setup_mic() method should only handle the task of setting the mic properties. The listening for audio input should be initiated once the user calls for the listen() or listen_loop() method.

Issue related to the listen_loop() method

While using the listen_loop() method with "dictate = False", which will print the transcriptions in the output console, I noticed the output had many line-breaks even if I didn't pause for a second while giving the voice input.

Wish: save audio recording in a file to accompany the corresponding transcription

I find that whisper is not that accurate for my English. I wish to have the capability to save the audio so that I can play back and correct the transcription. (This is the practice of otter.ai). I don't really need a real-time transcription. It's fine for me to record it, and get the transcription later, hopefully with the audio recording of the original.

I wish it would be even cooler to use my correction to improve whisper for my own situation. I'm not sure if it is possible currently.

Exacmples of mic = WhisperMic()

If possible provide examples - same as for CLI
like device number etc to setup in mic = WhisperMic()

like: mic = WhisperMic(mic_index=2)

How to terminate program

Hello!

How does the new version of your program handle stopping the live transcription? I haven't been able to find where the terminating key stroke is defined, apologies if I have overlooked this. I have been killing the process manually in my cli.

Thank you

Set Microphone to Use

Thank you for writing this package. Very useful!

I am running this on my Macbook and using the built-in mic works fine, but the quality is not that great.

I purchased a USB C conference mic which I can see in System Preferences > Input but it seems the script still wants to use the built-in mic.

Is it possible to override the mic that is used and point it towards my USB mic?

Latency Reduction

Hi, thanks for the amazing repository. It worked amazing. But is there a way to reduce latency? I am using on base model.

Or is it possible to infer on a downloaded .h5 model?
Thanks

--english option is generating an error

Great work, Blake. This has helped me no-end getting off of google docs voice typing.

Got one issue. Specifying --english generates the following error:

Traceback (most recent call last):
  File "/home/sunny/Developer/whisper_mic/mic.py", line 48, in <module>
    main()
  File "/home/sunny/.conda/envs/whisper/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/sunny/.conda/envs/whisper/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/sunny/.conda/envs/whisper/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/sunny/.conda/envs/whisper/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/sunny/Developer/whisper_mic/mic.py", line 42, in main
    result = audio_model.transcribe(save_path)
  File "/home/sunny/.conda/envs/whisper/lib/python3.9/site-packages/whisper/transcribe.py", line 82, in transcribe
    _, probs = model.detect_language(segment)
  File "/home/sunny/.conda/envs/whisper/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/sunny/.conda/envs/whisper/lib/python3.9/site-packages/whisper/decoding.py", line 35, in detect_language
    raise ValueError(f"This model doesn't have language tokens so it can't perform lang id")
ValueError: This model doesn't have language tokens so it can't perform lang id

Works fine without this flag

OSError: [Errno -9997] Invalid sample rate

For my mic configuration it seems that the default 16k sample rate is invalid. Needed to increase it to 48k in mic.py:

with sr.Microphone(sample_rate=48000) as source:

That solved that issue. Hopefully it helps others.

ModuleNotFoundError: No module named 'distutils'

[02/27/24 18:25:02] INFO No mic index provided, using whisper_mic.py:84
default
Traceback (most recent call last):
File "D:\programing\Python\LearnAI\mic_to_text.py", line 3, in
mic = WhisperMic()
^^^^^^^^^^^^
File "D:\programing\Python\LearnAI.venv\Lib\site-packages\whisper_mic\whisper_mic.py", line 79, in init
self.__setup_mic(mic_index)
File "D:\programing\Python\LearnAI.venv\Lib\site-packages\whisper_mic\whisper_mic.py", line 85, in setup_mic
self.source = sr.Microphone(sample_rate=16000, device_index=mic_index)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\programing\Python\LearnAI.venv\Lib\site-packages\speech_recognition_init.py", line 80, in init
self.pyaudio_module = self.get_pyaudio()
^^^^^^^^^^^^^^^^^^
File "D:\programing\Python\LearnAI.venv\Lib\site-packages\speech_recognition_init.py", line 111, in get_pyaudio
from distutils.version import LooseVersion
ModuleNotFoundError: No module named 'distutils'

[EDIT] fixed with
pip install setuptools

Feature inquiry, save txt

Hello,

It may be useful to have an option to save the transcribed speech to a text file.
Like all of the text that is printed to the console is appended to a text file (or other format).
Are there any plans to implement a feature similar to this?

Thank you

Junk output when not spoke or for some junk sound.

Thanks for the Cool Repo. But I found when there is no audio the model gives junk transcription.
Any suggestions to improve this here?

Crushes soon after start

Ubuntu 24.04, conda env with python 3.11.9,

Launching from terminal, it starts hearing me, but and after using it for few seconds it crushes (even if I'm using only English words while speaking):

$ whisper_mic  --loop --dictate --model=tiny
[05/28/24 23:54:51] INFO     No mic index provided, using default                                                                                                      whisper_mic.py:84
ALSA lib pcm_dsnoop.c:567:(snd_pcm_dsnoop_open) unable to open slave
ALSA lib pcm_dmix.c:1000:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm.c:2721:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2721:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2721:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_dmix.c:1000:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm_dsnoop.c:567:(snd_pcm_dsnoop_open) unable to open slave
ALSA lib pcm_dmix.c:1000:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm.c:2721:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2721:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2721:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_dmix.c:1000:(snd_pcm_dmix_open) unable to open slave
[05/28/24 23:54:52] INFO     Mic setup complete                                                                                                                        whisper_mic.py:95
                    INFO     Listening...                                                                                                                             whisper_mic.py:213
ALSA lib pcm_dsnoop.c:567:(snd_pcm_dsnoop_open) unable to open slave
ALSA lib pcm_dmix.c:1000:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm.c:2721:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2721:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2721:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_dmix.c:1000:(snd_pcm_dmix_open) unable to open slave
 Can we hear? See you next time! Okay See you next time [] What? I haven't said it. [BLANK_AUDIO] routine What a strange his food. -maya Traceback (most recent call last):
  File "/home/pupadupa/anaconda3/envs/maya/lib/python3.11/site-packages/pynput/keyboard/_base.py", line 492, in type
    self.release(key)
  File "/home/pupadupa/anaconda3/envs/maya/lib/python3.11/site-packages/pynput/keyboard/_base.py", line 427, in release
    self._handle(resolved, False)
  File "/home/pupadupa/anaconda3/envs/maya/lib/python3.11/site-packages/pynput/keyboard/_xorg.py", line 235, in _handle
    raise self.InvalidKeyException(key)
pynput.keyboard._base.Controller.InvalidKeyException: '겸'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/pupadupa/anaconda3/envs/maya/bin/whisper_mic", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/pupadupa/anaconda3/envs/maya/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pupadupa/anaconda3/envs/maya/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/home/pupadupa/anaconda3/envs/maya/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pupadupa/anaconda3/envs/maya/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pupadupa/anaconda3/envs/maya/lib/python3.11/site-packages/whisper_mic/cli.py", line 42, in main
    mic.listen_loop(dictate=dictate,phrase_time_limit=2)
  File "/home/pupadupa/anaconda3/envs/maya/lib/python3.11/site-packages/whisper_mic/whisper_mic.py", line 206, in listen_loop
    self.keyboard.type(result)
  File "/home/pupadupa/anaconda3/envs/maya/lib/python3.11/site-packages/pynput/keyboard/_base.py", line 495, in type
    raise self.InvalidCharacterException(i, character)
pynput.keyboard._base.Controller.InvalidCharacterException: (7, '겸')

Audio is lost

Hello friend! Thank you for your great initiative with this repository, it's very useful!
However, I noticed that because the call to the model is a blocking call, the voice input during the time the AI model is working is lost.
In a private repo I fixed this with threading and queues, would you be interested in a Pull Request for this repo as well?

MIT license?

Could you please consider using a less restrictive license, such as MIT? I mean, the whisper repo itself is MIT licensed. As it is, I'm not willing to reuse your code. I would have to rewrite it, which frankly is trivial these days using GPT.

Can you offer MIT as an alternative license please, so that I can use your code directly and give you credit for it?

Not Using GPU (Windows)

Running on windows via miniconda3. I am using the large model and it is not touching my VRAM, just spiking my CPU to about 50%.

Not seeing any flags to specify CPU vs GPU via help and not seeing anywhere in the code that specifies it.

add large-v3

please add large-v3 :)

Need help understanding the inputs

Hey!Thanks a lot for the implementation!Its simple and super useful. Can you explain the following conversion:

torch_audio = torch.from_numpy(np.frombuffer(audio.get_raw_data(), np.int16).flatten().astype(np.float32) / 32768.0)

Why do we need to convert it to float 32?

Proccess hanging in infinite loop when input audio is not loud enough

Running the followig code results in the process getting stuck when not speaking loud enough:

mic = WhisperMic()
result = mic.record(duration=2)
print(result)

The issue seems to come from the function __transcribe, where self.result_queue is filled only when is_audio_loud_enough is true:

    if is_audio_loud_enough:
        # faster_whisper returns an iterable object rather than a string
        if self.faster:
            segments, info = self.audio_model.transcribe(audio_data)
            predicted_text = ''
            for segment in segments:
                predicted_text += segment.text
        else:
            if self.english:
                result = self.audio_model.transcribe(audio_data,language='english',suppress_tokens="")
            else:
                result = self.audio_model.transcribe(audio_data,suppress_tokens="")
                predicted_text = result["text"]

        if not self.verbose:
            if predicted_text not in self.banned_results:
                self.result_queue.put_nowait(predicted_text)
        else:
            if predicted_text not in self.banned_results:
                self.result_queue.put_nowait(result)

As a result, the functions listen and record get stuck in the following loop, because self.result_queue is always empty:

    while True:
        if not self.result_queue.empty():
            return self.result_queue.get()

please add biggest model large-v3

Feature request: Provide code sample for Web UI Mic recording

Can the code be accessible from a web based tool?

Our students would love that!

to record from a web page and have their pronunciation transcribed and rated.

Thanls

Unable to capture audio

Hi. I tried using mic following your video tutorial but got this error.
File "C:\whisper_mic\whisper\lib\site-packages\pydub\utils.py", line 274, in mediainfo_json res = Popen(command, stdin=stdin_parameter, stdout=PIPE, stderr=PIPE)
File "C:\Users\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 966, in _init_ self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Users\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 1435, in _execute_child hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified

I found this and went ahead to change the shell parameter in POpen init to True and ended up with another error.

File "C:\winni\AppData\Local\Programs\Python\Python310\lib\json\decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Users\AppData\Local\Programs\Python\Python310\lib\json\decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

It seems to me like no audio is captured. I also tried using a headset but it still doesn't work. Would you be able to help?
I'm using Windows 10 and Python 3.10.4

Can't use mic in linux - ALSA errors

Got some shenanigans with Alsa happening. I'm on pop os latest update, with a mic plugged in through a Focusrite interface.

ALSA lib pcm_dsnoop.c:566:(snd_pcm_dsnoop_open) unable to open slave
ALSA lib pcm_dmix.c:999:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm.c:2666:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2666:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2666:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp
ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card
ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card'
ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card
ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card'
ALSA lib pcm_dmix.c:999:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm_dsnoop.c:566:(snd_pcm_dsnoop_open) unable to open slave
ALSA lib pcm_dmix.c:999:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm.c:2666:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2666:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2666:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp
ALSA lib pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp
ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card
ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card'
ALSA lib confmisc.c:160:(snd_config_get_card) Invalid field card
ALSA lib pcm_usb_stream.c:482:(_snd_pcm_usb_stream_open) Invalid card 'card'
ALSA lib pcm_dmix.c:999:(snd_pcm_dmix_open) unable to open slave

Which leads to a NoneType error

Traceback (most recent call last):
  File "/home/raskoll/.local/lib/python3.10/site-packages/pynput/keyboard/_xorg.py", line 209, in __del__
  File "/usr/lib/python3/dist-packages/Xlib/display.py", line 161, in close
  File "/usr/lib/python3/dist-packages/Xlib/protocol/display.py", line 258, in close
  File "/usr/lib/python3/dist-packages/Xlib/protocol/display.py", line 255, in flush
  File "/usr/lib/python3/dist-packages/Xlib/protocol/display.py", line 565, in send_and_recv
AttributeError: 'NoneType' object has no attribute 'error'

My mic works in other apps so idrk what the problem is.

Also under __transcribe:

if self.english:
                    result = self.audio_model.transcribe(audio_data,language='english',suppress_tokens="")
                    predicted_text = result["text"] #This was missing

Whisper_mic for faster-whisper/CTranslate2?

I've found this script to be amazingly effective; it gives good real-time performance and accuracy, although I do wish the latency was a little lower (or that it could simply spit out smaller chunks of text more continuously).

It seems like faster-whisper is able to process the same audio much faster, and it might give even better real-time low-latency results than using stock whisper. I've been toying around with the code, but I'm very novice and clearly CTranslate2 uses different systems and parameters, so it keeps throwing errors when I try to point at it instead of whisper. Would you consider including support for faster-whisper to benefit from the massive performance improvements and lower memory usage?

FileNotFoundError: [WinError 2] The system cannot find the file specified

I'm not sure how to fix this, but the issue lies with this line:
audio_clip = AudioSegment.from_file(data)
used this code:

print("test1")
audio = r.listen(source)
print("test2")
data = io.BytesIO(audio.get_wav_data())
print(data)
print("\n\n\n")
audio_clip = AudioSegment.from_file(data)
print("test4")

got this printout (excluding the error):

Say something!
test1
test2
<_io.BytesIO object at 0x000002889399A0C0>

not sure if this is related:

RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
  warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)

I've installed ffmpeg, not sure why I'm getting that

Feature Request: Use isolated-env to make the app bind to the GPU automatically on windows

I made transcribe-anything which will install the cuda version of whisper when it detects that you are using nvida-smi.

I wanted to point this out. The motivation was installing whisper without messing up the users global install of pytorch packages.

https://github.com/zackees/isolated-environment

Stop listening after a period of silence

if its listening for example result = mic.listen(timout=6) make it so it automaticly stops after a silence_timeout (deflaut 1.5s)

Suggest adding gcc, ffmpeg, and portaudio19-dev to requirements.txt

After trying to install with pip install -r requirements, it errored out saying that gcc wasn't installed. Following that, it failed to build the wheel for pyaudio until I installed portaudio19-dev.

I suggest adding those two dependencies to requirements.txt file.

Many incomplete segments, what is it even returning? predicted_text referenced before assignment.

Error:

  File "C:\Users\Dhruv\Desktop\py\AudioProcessorWhisper\whisper_mic_test.py", line 5, in <module>
    result = mic.listen()
  File "C:\Users\Dhruv\AppData\Local\Programs\Python\Python310\lib\site-packages\whisper_mic\whisper_mic.py", line 215, in listen
    self.__listen_handler(timeout, phrase_time_limit)
  File "C:\Users\Dhruv\AppData\Local\Programs\Python\Python310\lib\site-packages\whisper_mic\whisper_mic.py", line 132, in __listen_handler
    self.__transcribe(data=audio_data)
  File "C:\Users\Dhruv\AppData\Local\Programs\Python\Python310\lib\site-packages\whisper_mic\whisper_mic.py", line 184, in __transcribe
    if predicted_text not in self.banned_results:
UnboundLocalError: local variable 'predicted_text' referenced before assignment

Code:

from whisper_mic import WhisperMic

mic = WhisperMic( mic_index=2, save_file=True, english=True )

result = mic.listen()
print(result)

I solved it with this:

    def __transcribe(self,data=None, realtime: bool = False) -> None:
        if data is None:
            audio_data = self.__get_all_audio()
        else:
            audio_data = data
        audio_data,is_audio_loud_enough = self.__preprocess(audio_data)

        if is_audio_loud_enough:
            # faster_whisper returns an iterable object rather than a string
            predicted_text = '' # I MOVED THIS HERE, ON TOP <-------
            
            if self.faster:
                segments, info = self.audio_model.transcribe(audio_data)
                for segment in segments:
                    predicted_text += segment.text
            else:
                if self.english:
                    result = self.audio_model.transcribe(audio_data,language='english',suppress_tokens="") # This is what i need but it doesnt return?
                else:
                    result = self.audio_model.transcribe(audio_data,suppress_tokens="")
                    predicted_text = result["text"]

            if not self.verbose:
                if predicted_text not in self.banned_results:
                    self.result_queue.put_nowait(predicted_text)
            else:
                if predicted_text not in self.banned_results:
                    self.result_queue.put_nowait(result)

            print('predicted_text ' + predicted_text)

            if self.save_file:
                # os.remove(audio_data)
                self.file.write(predicted_text)

I decided to modify the code to actually return something and put something in the predicted text.

No code in mic.py.

## TODO: Export logic to a separate file.  This could allow this to be a pip package.

no transcript output

hello ,

i have run everthing according your tutorial but when i run the code :
whisper-mic --model small -- loop

nothing transcript from my voice .i use the built-in mic of laptop

mic.py file is empty

Is there any command for Troubleshooting pic.py issue in windows

Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work

Its not working and showing this error while I tried running whisper_mic.py, pls gimme solution for it.

RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)

mallorbc / whisper_mic Goto Github PK

whisper_mic's Introduction

Whisper Mic

Video Tutorial

Professional Assistance

Setup

Available models and languages

Microphone Demo

Transcribing To A File

Usage In Other Projects

Troubleshooting

Contributing

License

Thanks

whisper_mic's People

Contributors

Stargazers

Watchers

Forkers

whisper_mic's Issues

Issue

Where the problem occurs

Environment

Steps so far

Proposal related to the listen() method

Solution

Issue related to the setup_mic() method

Solution

Issue related to the listen_loop() method

Recommend Projects

Recommend Topics

Recommend Org

Jobs