Comments (29)
from pywhispercpp.
from pywhispercpp.
from pywhispercpp.
Nevermind. ;) from pywhispercpp.examples.assistant import Assistant
import subprocess
from langchain.llms import Ollama
llm = Ollama(model="llama2")
def chatter(inputter):
res = llm.predict(inputter)
print (res)
newres = "Umm" + "umm" + res
subprocess.run(["say", newres])
my_assistant = Assistant(commands_callback=chatter, n_threads=4)
my_assistant.start()
from pywhispercpp.
You are welcome @MikeyBeez, `Glad you found the project useful :)
I saw you've already found your way through the assistant module, good job 👍
You define a function that will take a string as input (whatever whisper
has transcribed) and the Assistant
will execute that function whenever some speech is detected. It's a simple example based on pywhispercpp
and a vad
, I just made it easy for people to create their own assistants out of the box.
a quick additional tip is that you can use bigger whisper
models (if you have a powerful machine) to get better transcription results, because by default just the tiny
model is used
For example:
my_assistant = Assistant(model='small', commands_callback=chatter, n_threads=4, model_log_level=logging.ERROR)
# you can turn off the logs if you want to print the results from `ollama` by setting the `model_log_level` to # #
# `logging.ERROR`:
my_assistant.start()
this way the small
model will be used instead of the tiny one.
The idea I had in mind when I created this module is similar to what you are trying to achieve now, live transcription with whisper
, feed the transcription to a LLM and use a TTS model to convert the result to speech, but unfortunately I didn't find enough time to do it yet, that's why I didn't share any code, sorry for that!
However let me know if you need any help, I'll be more that happy to do so.
from pywhispercpp.
I made a change:
from pywhispercpp.examples.assistant import Assistant
from pywhispercpp.model import Model
import subprocess
from langchain.llms import Ollama
llm = Ollama(model="llama2")
whisper = Model('base.en', n_threads=6)
def chatter(inputter):
res = llm.predict(inputter)
print (res)
newres = "Umm" + "umm" + res
subprocess.run(["say", newres])
my_assistant = Assistant(commands_callback=chatter, n_threads=4)
my_assistant.start()
As you can see, I started using the base model. It's much faster. Still, this program doesn't use streaming, so I need to wait a long time for a reply. It's not like a natural conversation. I've been trying quantized GGUF models, but so far, they're pretty bad compared to un-quantized llama2. It would be great if the model could stream results through your callback. I'll work on that. After all, Ollama streams when run from the command line. The OS' "say" command sucks anyway. So I can do without that. Apple is just awful for any API for STT or TTS. Even their MLX stuff can't use the ANE. I think all their MacOS programmers are on drugs. ;) Cheers!
from pywhispercpp.
This seems better: from pywhispercpp.examples.assistant import Assistant
from pywhispercpp.model import Model
from langchain.llms import Ollama
llm = Ollama(model="llama2")
whisper = Model('base.en', n_threads=6)
def chatter(inputter):
print(inputter)
res = llm.predict(inputter)
print (res)
responses = []
def process_responses():
for response in responses:
print(response)
my_assistant = Assistant(commands_callback=chatter, n_threads=4)
my_assistant.start()
while True:
response = responses.pop(0)
if response is None:
break
process_responses()
from pywhispercpp.
Is there a switch to turn down verbosity? I only want to see my print statements? This is really very good!
from pywhispercpp.
I made a change:
from pywhispercpp.examples.assistant import Assistant from pywhispercpp.model import Model import subprocess from langchain.llms import Ollama
llm = Ollama(model="llama2") whisper = Model('base.en', n_threads=6)
def chatter(inputter): res = llm.predict(inputter) print (res) newres = "Umm" + "umm" + res subprocess.run(["say", newres])
my_assistant = Assistant(commands_callback=chatter, n_threads=4) my_assistant.start()
As you can see, I started using the base model. It's much faster. Still,
You have to use the base
model inside the Assistant class
because that class is creating a whisper
model internally, so you have to remove this line:
whisper = Model('base.en', n_threads=6)
and use the code I provided above, because you are creating two instances in that case.
And if you use bigger models and your machine is not powerful, the processing will take longer, if you want fast inference use smaller models: tiny.en
.
this program doesn't use streaming, so I need to wait a long time for a reply. It's not like a natural conversation. I've been trying quantized GGUF models, but so far, they're pretty bad compared to un-quantized llama2. It would be great if the model could stream results through your callback. I'll work on that.
quantized GGUF models should give descent results and they are much faster compared to the original llama2 models unless you have a powerful GPU that can run them all without a problem.
For streaming, you need to use the streamcallback in langchain to get the tokens as they get generated, you can then execute the TTS. But I don't know how far we can get close to a natural conversation. I'll try to reserve some time and try to create a script if you like.
After all, Ollama streams when run from the command line. The OS' "say" command sucks anyway. So I can do without that. Apple is just awful for any API for STT or TTS. Even their MLX stuff can't use the ANE. I think all their MacOS programmers are on drugs. ;) Cheers!
"MacOS programmers are on drugs" You made my day 😂
I am not a Mac user so I cannot tell, if the say command sucks, just use TTS AI models directly from python.
from pywhispercpp.
Is there a switch to turn down verbosity? I only want to see my print statements? This is really very good!
Yes, set the model_log_level
to logging.ERROR
for example to reduce the verbosity
import logging
my_assistant = Assistant(model_log_level=logging.ERROR, commands_callback=chatter, n_threads=4)
my_assistant.start()
from pywhispercpp.
LOL! I'm glad I made you laugh. I did try setting the log level to error. It helped a bit, but I'm still getting lots of messages that I'd rather not see. I did discover that it's bluetooth that is causing problems with the say command. If I use my monitor's crappy speakers, the say command works okay. So I will probably write an applescript to switch to my hdmi speakers when the assistant runs. Quantized models run much faster, but their responses are moronic. I'm using Ollama's create switch to convert Huggingface quantized models to Ollama's format. The conversion may be lossy. https://www.youtube.com/watch?v=7BH4C6-HP14
from pywhispercpp.
I also sent error to /dev/null . . .
from pywhispercpp.
Oops! I forgot. I'm also finding that I only have a few seconds to create a prompt. If I have a complex thought, I can't express it fast enough. Is there a way to increase the time for speaking?
from pywhispercpp.
Here's the current version:
from pywhispercpp.examples.assistant import Assistant
from pywhispercpp.model import Model
from langchain.llms import Ollama
from colorama import init, Fore, Style
from gtts import gTTS
import os
import time
import logging
# Initialize colorama
init()
llm = Ollama(model="llama2")
#whisper = Model('base.en', n_threads=6, speed_up=True)
#whisper = Model('base.en', n_threads=6, speed_up=True, print_realtime=False, print_progress=False, print_timestamps=False)
def chatter(inputter):
print(Fore.CYAN + inputter + Style.RESET_ALL)
res = llm.predict(inputter)
print (Fore.RED + Fore.YELLOW + res + Style.RESET_ALL)
tts = gTTS(res)
tts.save("output.mp3")
time.sleep(2)
#os.system("play -n -c1 synth sin %-12 sin %-9 sin %-5 sin %-2 fade h 0.1 1 0.1")
os.system("play output.mp3")
#my_assistant = Assistant(commands_callback=chatter, n_threads=4, model_log_level=logging.ERROR)
my_assistant = Assistant(model='tiny', commands_callback=chatter, n_threads=4, model_log_level=logging.ERROR)
my_assistant.start()
from pywhispercpp.
with logging set to error 2>/dev/null gives me the quiet experience. So that's solved too. I run it from a shell script:
`#!/usr/bin/env zsh
/Users/bard/miniforge3/envs/whisper/bin/python /Users/bard/Code/whisper.cpp/mikey/test6.py 2>/dev/null
`
from pywhispercpp.
I also sent error to /dev/null . . .
Yes, just send the output to /dev/null to get rid of the remaining whisper.cpp
logs
from pywhispercpp.
Oops! I forgot. I'm also finding that I only have a few seconds to create a prompt. If I have a complex thought, I can't express it fast enough. Is there a way to increase the time for speaking?
Yes I thought about that at that time as well, you can increase the silence threshold so you can get more time before the inference, use it as follows:
my_assistant = Assistant(model='tiny', commands_callback=chatter, n_threads=4, silence_threshold=120)
silence_threshold
takes an int
, the larger the number the more time you will have before the inference.
from pywhispercpp.
Here's the current version:
from pywhispercpp.examples.assistant import Assistant from pywhispercpp.model import Model from langchain.llms import Ollama from colorama import init, Fore, Style from gtts import gTTS import os import time import logging # Initialize colorama init() llm = Ollama(model="llama2") #whisper = Model('base.en', n_threads=6, speed_up=True) #whisper = Model('base.en', n_threads=6, speed_up=True, print_realtime=False, print_progress=False, print_timestamps=False) def chatter(inputter): print(Fore.CYAN + inputter + Style.RESET_ALL) res = llm.predict(inputter) print (Fore.RED + Fore.YELLOW + res + Style.RESET_ALL) tts = gTTS(res) tts.save("output.mp3") time.sleep(2) #os.system("play -n -c1 synth sin %-12 sin %-9 sin %-5 sin %-2 fade h 0.1 1 0.1") os.system("play output.mp3") #my_assistant = Assistant(commands_callback=chatter, n_threads=4, model_log_level=logging.ERROR) my_assistant = Assistant(model='tiny', commands_callback=chatter, n_threads=4, model_log_level=logging.ERROR) my_assistant.start()
Yes, this is much better version than the previous one, combined with the small shell script will be great, great work 👍
from pywhispercpp.
Hello @MikeyBeez,
here is a quick elaborated example script that uses the Assistant module with langchain, with a bit of dirty multi-threading it supports streaming and talking at the same time, once a sentence has been generated.
You can find the source code here if you want to try it out.
You can follow the instructions on the readme page.
Of course it's not prefect, but you can take it as a starting point.
Hope it helps.
from pywhispercpp.
from pywhispercpp.
from pywhispercpp.
from pywhispercpp.
This is fantastic. I’ll need to work a bit on the tts as it’s a bit strange on my funny little mac. Thank you so much! May I share it in my repo? This way more people will have access. I’m looking into something called semantic router that can choose from different models. I hope to find something that uses LoRA adapters with a base model so that the semantic router can choose the right adapter. The semantic router works with langchain which we’re already using.
Sure, feel free to fork it, share or modify it :)
BTW, I just figured out that I am a fan of your Youtube channel and I've watched you developing Julie-Julie a long time ago when I was doing some research about how vocal assistants are made. Life is mysterious lol keep up uploading your awesome videos.
from pywhispercpp.
BTW, what version of python did you use?
I am using Python 3.10
from pywhispercpp.
I’ll need to think about this. Two things are happening. The tts is stuttering and it’s being picked up and processed by the microphone for stt. I suspect we need to use asyncio to serialize the speaking thread and also to serialize the listening and speaking methods. It will take me a while to figure this out. You’re obviously a much better programmer. My background is mostly system and db administration. Cheers!
No, I am no much better programmer, you do have tons of experience than I do, I am just doing my best.
Yeah, I noticed this issue as well, I think there is a problem with sounddevice
, (the package I used for getting input and for output as well) but because there is so much threading I didn't dig deeper to see what's wrong, that's why I am checking inside the callback if the play_back
event is set before running the llm
again.
Also try to use an external good microphone ans dome headsets, it helped with that issue.
from pywhispercpp.
from pywhispercpp.
No, Is this a new tts model ? How fast is it ? Can it replace gtts ?
I tried Vits and it was good but because running ollama took all my gpu, running inference on cpu takes a long time
from pywhispercpp.
from pywhispercpp.
Woow that's interesting ..
I don't like this meta policy of hiding the models behind an application, even though I really respect the fact of releasing the weights open source,
But I think I should apply this time,
If you ever get the chance to try it .. let me know how it goes.
from pywhispercpp.
Related Issues (20)
- Unknown language error HOT 4
- How to make transcription and speaker diarization using pywhispercpp HOT 1
- How to use coreML models in Mac M2? HOT 3
- "Cannot find source file: ggml.h" when trying to install on Ubuntu 22.04 on aarch64 HOT 3
- Integrating pywhispercpp as the first extension to lollms-webui HOT 2
- Nothing happens HOT 13
- pywhispercpp/whisper.cpp/ggml-opencl.c:4:10: fatal error: 'clblast_c.h' file not found #include <clblast_c.h> HOT 9
- ERROR - Invalid model name `./model.bin` HOT 1
- ERROR - unable to initialize from path HOT 5
- Unable to install on raspberry pi 4 HOT 5
- How to add space between subtitles? HOT 1
- word-level timestamps? HOT 5
- About GPU question HOT 1
- Tool is super slow / runs forever HOT 10
- Unable to load `quantized` models HOT 3
- _pywhispercpp module could not be found HOT 28
- "ggml-metal.metal" file couldn't be found when loading the large-v3 model for CoreML HOT 7
- failed to compute log mel spectrogram HOT 3
- Installation from source leads to non-functional installation HOT 17
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pywhispercpp.