thank you for simplifying programatic access to whisper.cpp. I really appreciate your

You are welcome <a class="user-mention notranslate" data-hovercard-type="user" data-ho

Using the agent for interacting with ollama models,about abdeladim-s/pywhispercpp

MikeyBeez commented on June 15, 2024 1

This is really beautiful code!

On Jan 10, 2024, at 9:45 PM, Abdeladim Sadiki ***@***.***> wrote: Hello @MikeyBeez <https://github.com/MikeyBeez>, here is a quick elaborated example script that uses the Assistant module with langchain, with a bit of dirty multi-threading it supports streaming and talking at the same time, once a sentence has been generated. You can find the source code here <https://github.com/abdeladim-s/llm-chatter/blob/main/llm_chatter/main.py> if you want to try it out. You can follow the instructions on the readme page. Of course it's not prefect, but you can take it as a starting point. Hope it helps. — Reply to this email directly, view it on GitHub <#29 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADM2NQCGNQH45A6VI62XNHDYN5N4PAVCNFSM6AAAAABBP6UNLCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBWGE3TOMRXHE>. You are receiving this because you were mentioned.

from pywhispercpp.

MikeyBeez commented on June 15, 2024 1

I think I’ll just put a link to your repo in my readme file. there’s no need for me to copy your work. :) This is great!

…

On Jan 10, 2024, at 9:45 PM, Abdeladim Sadiki ***@***.***> wrote: Hello @MikeyBeez <https://github.com/MikeyBeez>, here is a quick elaborated example script that uses the Assistant module with langchain, with a bit of dirty multi-threading it supports streaming and talking at the same time, once a sentence has been generated. You can find the source code here <https://github.com/abdeladim-s/llm-chatter/blob/main/llm_chatter/main.py> if you want to try it out. You can follow the instructions on the readme page. Of course it's not prefect, but you can take it as a starting point. Hope it helps. — Reply to this email directly, view it on GitHub <#29 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADM2NQCGNQH45A6VI62XNHDYN5N4PAVCNFSM6AAAAABBP6UNLCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBWGE3TOMRXHE>. You are receiving this because you were mentioned.

from pywhispercpp.

MikeyBeez commented on June 15, 2024 1

That’s so nice of you. I didn’t think anyone liked my videos. It’s really wonderful to hear that. :)

…

On Jan 10, 2024, at 11:38 PM, Abdeladim Sadiki ***@***.***> wrote: This is fantastic. I’ll need to work a bit on the tts as it’s a bit strange on my funny little mac. Thank you so much! May I share it in my repo? This way more people will have access. I’m looking into something called semantic router that can choose from different models. I hope to find something that uses LoRA adapters with a base model so that the semantic router can choose the right adapter. The semantic router works with langchain which we’re already using. Sure, feel free to fork it, share or modify it :) BTW, I just figured out that I am a fan of your Youtube channel and I've watched you developing Julie-Julie a long time ago when I was doing some research about how vocal assistants are made. Life is mysterious lol keep up uploading your awesome videos. — Reply to this email directly, view it on GitHub <#29 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADM2NQG4ILAYXURT2GIZGXLYN53GRAVCNFSM6AAAAABBP6UNLCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBWGMYDENRSGI>. You are receiving this because you were mentioned.

from pywhispercpp.

MikeyBeez commented on June 15, 2024

Nevermind. ;) from pywhispercpp.examples.assistant import Assistant
import subprocess
from langchain.llms import Ollama

llm = Ollama(model="llama2")

def chatter(inputter):
res = llm.predict(inputter)
print (res)
newres = "Umm" + "umm" + res
subprocess.run(["say", newres])

my_assistant = Assistant(commands_callback=chatter, n_threads=4)
my_assistant.start()

from pywhispercpp.

abdeladim-s commented on June 15, 2024

You are welcome @MikeyBeez, `Glad you found the project useful :)

I saw you've already found your way through the assistant module, good job 👍

You define a function that will take a string as input (whatever whisper has transcribed) and the Assistant will execute that function whenever some speech is detected. It's a simple example based on pywhispercpp and a vad, I just made it easy for people to create their own assistants out of the box.

a quick additional tip is that you can use bigger whisper models (if you have a powerful machine) to get better transcription results, because by default just the tiny model is used
For example:

my_assistant = Assistant(model='small', commands_callback=chatter, n_threads=4, model_log_level=logging.ERROR)
#  you can turn off the logs if you want to print the results from `ollama` by setting the `model_log_level` to # # 
 # `logging.ERROR`: 
my_assistant.start()

this way the small model will be used instead of the tiny one.

The idea I had in mind when I created this module is similar to what you are trying to achieve now, live transcription with whisper, feed the transcription to a LLM and use a TTS model to convert the result to speech, but unfortunately I didn't find enough time to do it yet, that's why I didn't share any code, sorry for that!
However let me know if you need any help, I'll be more that happy to do so.

from pywhispercpp.

MikeyBeez commented on June 15, 2024

I made a change:

from pywhispercpp.examples.assistant import Assistant
from pywhispercpp.model import Model
import subprocess
from langchain.llms import Ollama

llm = Ollama(model="llama2")
whisper = Model('base.en', n_threads=6)

def chatter(inputter):
res = llm.predict(inputter)
print (res)
newres = "Umm" + "umm" + res
subprocess.run(["say", newres])

my_assistant = Assistant(commands_callback=chatter, n_threads=4)
my_assistant.start()

As you can see, I started using the base model. It's much faster. Still, this program doesn't use streaming, so I need to wait a long time for a reply. It's not like a natural conversation. I've been trying quantized GGUF models, but so far, they're pretty bad compared to un-quantized llama2. It would be great if the model could stream results through your callback. I'll work on that. After all, Ollama streams when run from the command line. The OS' "say" command sucks anyway. So I can do without that. Apple is just awful for any API for STT or TTS. Even their MLX stuff can't use the ANE. I think all their MacOS programmers are on drugs. ;) Cheers!

from pywhispercpp.

MikeyBeez commented on June 15, 2024

This seems better: from pywhispercpp.examples.assistant import Assistant
from pywhispercpp.model import Model
from langchain.llms import Ollama

llm = Ollama(model="llama2")
whisper = Model('base.en', n_threads=6)

def chatter(inputter):
print(inputter)
res = llm.predict(inputter)
print (res)

responses = []

def process_responses():
for response in responses:
print(response)

my_assistant = Assistant(commands_callback=chatter, n_threads=4)
my_assistant.start()

while True:
response = responses.pop(0)
if response is None:
break
process_responses()

from pywhispercpp.

MikeyBeez commented on June 15, 2024

Is there a switch to turn down verbosity? I only want to see my print statements? This is really very good!

from pywhispercpp.

abdeladim-s commented on June 15, 2024

I made a change:

from pywhispercpp.examples.assistant import Assistant from pywhispercpp.model import Model import subprocess from langchain.llms import Ollama

llm = Ollama(model="llama2") whisper = Model('base.en', n_threads=6)

def chatter(inputter): res = llm.predict(inputter) print (res) newres = "Umm" + "umm" + res subprocess.run(["say", newres])

my_assistant = Assistant(commands_callback=chatter, n_threads=4) my_assistant.start()

As you can see, I started using the base model. It's much faster. Still,

You have to use the base model inside the Assistant class because that class is creating a whisper model internally, so you have to remove this line:

whisper = Model('base.en', n_threads=6)

and use the code I provided above, because you are creating two instances in that case.
And if you use bigger models and your machine is not powerful, the processing will take longer, if you want fast inference use smaller models: tiny.en.

this program doesn't use streaming, so I need to wait a long time for a reply. It's not like a natural conversation. I've been trying quantized GGUF models, but so far, they're pretty bad compared to un-quantized llama2. It would be great if the model could stream results through your callback. I'll work on that.

quantized GGUF models should give descent results and they are much faster compared to the original llama2 models unless you have a powerful GPU that can run them all without a problem.
For streaming, you need to use the streamcallback in langchain to get the tokens as they get generated, you can then execute the TTS. But I don't know how far we can get close to a natural conversation. I'll try to reserve some time and try to create a script if you like.

After all, Ollama streams when run from the command line. The OS' "say" command sucks anyway. So I can do without that. Apple is just awful for any API for STT or TTS. Even their MLX stuff can't use the ANE. I think all their MacOS programmers are on drugs. ;) Cheers!

"MacOS programmers are on drugs" You made my day 😂
I am not a Mac user so I cannot tell, if the say command sucks, just use TTS AI models directly from python.

from pywhispercpp.

abdeladim-s commented on June 15, 2024

Is there a switch to turn down verbosity? I only want to see my print statements? This is really very good!

Yes, set the model_log_level to logging.ERROR for example to reduce the verbosity

import logging

my_assistant = Assistant(model_log_level=logging.ERROR, commands_callback=chatter, n_threads=4)
my_assistant.start()

from pywhispercpp.

MikeyBeez commented on June 15, 2024

LOL! I'm glad I made you laugh. I did try setting the log level to error. It helped a bit, but I'm still getting lots of messages that I'd rather not see. I did discover that it's bluetooth that is causing problems with the say command. If I use my monitor's crappy speakers, the say command works okay. So I will probably write an applescript to switch to my hdmi speakers when the assistant runs. Quantized models run much faster, but their responses are moronic. I'm using Ollama's create switch to convert Huggingface quantized models to Ollama's format. The conversion may be lossy. https://www.youtube.com/watch?v=7BH4C6-HP14

from pywhispercpp.

MikeyBeez commented on June 15, 2024

I also sent error to /dev/null . . .

from pywhispercpp.

MikeyBeez commented on June 15, 2024

Oops! I forgot. I'm also finding that I only have a few seconds to create a prompt. If I have a complex thought, I can't express it fast enough. Is there a way to increase the time for speaking?

from pywhispercpp.

MikeyBeez commented on June 15, 2024

Here's the current version:

from pywhispercpp.examples.assistant import Assistant
from pywhispercpp.model import Model
from langchain.llms import Ollama
from colorama import init, Fore, Style
from gtts import gTTS
import os
import time
import logging

# Initialize colorama
init()

llm = Ollama(model="llama2")
#whisper = Model('base.en', n_threads=6, speed_up=True)
#whisper = Model('base.en', n_threads=6, speed_up=True, print_realtime=False, print_progress=False, print_timestamps=False)

def chatter(inputter):
    print(Fore.CYAN + inputter + Style.RESET_ALL)
    res = llm.predict(inputter)
    print (Fore.RED + Fore.YELLOW + res + Style.RESET_ALL)
    tts = gTTS(res)
    tts.save("output.mp3")
    time.sleep(2)
    #os.system("play -n -c1 synth sin %-12 sin %-9 sin %-5 sin %-2 fade h 0.1 1 0.1")
    os.system("play output.mp3")

#my_assistant = Assistant(commands_callback=chatter, n_threads=4, model_log_level=logging.ERROR)
my_assistant = Assistant(model='tiny', commands_callback=chatter, n_threads=4, model_log_level=logging.ERROR)
my_assistant.start()

from pywhispercpp.

MikeyBeez commented on June 15, 2024

with logging set to error 2>/dev/null gives me the quiet experience. So that's solved too. I run it from a shell script:
`#!/usr/bin/env zsh

/Users/bard/miniforge3/envs/whisper/bin/python /Users/bard/Code/whisper.cpp/mikey/test6.py 2>/dev/null

`

from pywhispercpp.

abdeladim-s commented on June 15, 2024

I also sent error to /dev/null . . .

Yes, just send the output to /dev/null to get rid of the remaining whisper.cpp logs

from pywhispercpp.

abdeladim-s commented on June 15, 2024

Oops! I forgot. I'm also finding that I only have a few seconds to create a prompt. If I have a complex thought, I can't express it fast enough. Is there a way to increase the time for speaking?

Yes I thought about that at that time as well, you can increase the silence threshold so you can get more time before the inference, use it as follows:

my_assistant = Assistant(model='tiny', commands_callback=chatter, n_threads=4, silence_threshold=120)

silence_threshold takes an int, the larger the number the more time you will have before the inference.

from pywhispercpp.

abdeladim-s commented on June 15, 2024

Here's the current version:

from pywhispercpp.examples.assistant import Assistant
from pywhispercpp.model import Model
from langchain.llms import Ollama
from colorama import init, Fore, Style
from gtts import gTTS
import os
import time
import logging

# Initialize colorama
init()

llm = Ollama(model="llama2")
#whisper = Model('base.en', n_threads=6, speed_up=True)
#whisper = Model('base.en', n_threads=6, speed_up=True, print_realtime=False, print_progress=False, print_timestamps=False)

def chatter(inputter):
    print(Fore.CYAN + inputter + Style.RESET_ALL)
    res = llm.predict(inputter)
    print (Fore.RED + Fore.YELLOW + res + Style.RESET_ALL)
    tts = gTTS(res)
    tts.save("output.mp3")
    time.sleep(2)
    #os.system("play -n -c1 synth sin %-12 sin %-9 sin %-5 sin %-2 fade h 0.1 1 0.1")
    os.system("play output.mp3")

#my_assistant = Assistant(commands_callback=chatter, n_threads=4, model_log_level=logging.ERROR)
my_assistant = Assistant(model='tiny', commands_callback=chatter, n_threads=4, model_log_level=logging.ERROR)
my_assistant.start()

Yes, this is much better version than the previous one, combined with the small shell script will be great, great work 👍

from pywhispercpp.

abdeladim-s commented on June 15, 2024

Hello @MikeyBeez,
here is a quick elaborated example script that uses the Assistant module with langchain, with a bit of dirty multi-threading it supports streaming and talking at the same time, once a sentence has been generated.
You can find the source code here if you want to try it out.
You can follow the instructions on the readme page.
Of course it's not prefect, but you can take it as a starting point.
Hope it helps.

from pywhispercpp.

MikeyBeez commented on June 15, 2024

This is fantastic. I’ll need to work a bit on the tts as it’s a bit strange on my funny little mac. Thank you so much! May I share it in my repo? This way more people will have access. I’m looking into something called semantic router that can choose from different models. I hope to find something that uses LoRA adapters with a base model so that the semantic router can choose the right adapter. The semantic router works with langchain which we’re already using.

…

On Jan 10, 2024, at 9:45 PM, Abdeladim Sadiki ***@***.***> wrote: Hello @MikeyBeez <https://github.com/MikeyBeez>, here is a quick elaborated example script that uses the Assistant module with langchain, with a bit of dirty multi-threading it supports streaming and talking at the same time, once a sentence has been generated. You can find the source code here <https://github.com/abdeladim-s/llm-chatter/blob/main/llm_chatter/main.py> if you want to try it out. You can follow the instructions on the readme page. Of course it's not prefect, but you can take it as a starting point. Hope it helps. — Reply to this email directly, view it on GitHub <#29 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADM2NQCGNQH45A6VI62XNHDYN5N4PAVCNFSM6AAAAABBP6UNLCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBWGE3TOMRXHE>. You are receiving this because you were mentioned.

from pywhispercpp.

MikeyBeez commented on June 15, 2024

BTW, what version of python did you use?

…

On Jan 10, 2024, at 9:45 PM, Abdeladim Sadiki ***@***.***> wrote: Hello @MikeyBeez <https://github.com/MikeyBeez>, here is a quick elaborated example script that uses the Assistant module with langchain, with a bit of dirty multi-threading it supports streaming and talking at the same time, once a sentence has been generated. You can find the source code here <https://github.com/abdeladim-s/llm-chatter/blob/main/llm_chatter/main.py> if you want to try it out. You can follow the instructions on the readme page. Of course it's not prefect, but you can take it as a starting point. Hope it helps. — Reply to this email directly, view it on GitHub <#29 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADM2NQCGNQH45A6VI62XNHDYN5N4PAVCNFSM6AAAAABBP6UNLCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBWGE3TOMRXHE>. You are receiving this because you were mentioned.

from pywhispercpp.

MikeyBeez commented on June 15, 2024

I’ll need to think about this. Two things are happening. The tts is stuttering and it’s being picked up and processed by the microphone for stt. I suspect we need to use asyncio to serialize the speaking thread and also to serialize the listening and speaking methods. It will take me a while to figure this out. You’re obviously a much better programmer. My background is mostly system and db administration. Cheers!

…

On Jan 10, 2024, at 9:45 PM, Abdeladim Sadiki ***@***.***> wrote: Hello @MikeyBeez <https://github.com/MikeyBeez>, here is a quick elaborated example script that uses the Assistant module with langchain, with a bit of dirty multi-threading it supports streaming and talking at the same time, once a sentence has been generated. You can find the source code here <https://github.com/abdeladim-s/llm-chatter/blob/main/llm_chatter/main.py> if you want to try it out. You can follow the instructions on the readme page. Of course it's not prefect, but you can take it as a starting point. Hope it helps. — Reply to this email directly, view it on GitHub <#29 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADM2NQCGNQH45A6VI62XNHDYN5N4PAVCNFSM6AAAAABBP6UNLCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBWGE3TOMRXHE>. You are receiving this because you were mentioned.

from pywhispercpp.

abdeladim-s commented on June 15, 2024

This is fantastic. I’ll need to work a bit on the tts as it’s a bit strange on my funny little mac. Thank you so much! May I share it in my repo? This way more people will have access. I’m looking into something called semantic router that can choose from different models. I hope to find something that uses LoRA adapters with a base model so that the semantic router can choose the right adapter. The semantic router works with langchain which we’re already using.

Sure, feel free to fork it, share or modify it :)
BTW, I just figured out that I am a fan of your Youtube channel and I've watched you developing Julie-Julie a long time ago when I was doing some research about how vocal assistants are made. Life is mysterious lol keep up uploading your awesome videos.

from pywhispercpp.

abdeladim-s commented on June 15, 2024

BTW, what version of python did you use?

I am using Python 3.10

from pywhispercpp.

abdeladim-s commented on June 15, 2024

I’ll need to think about this. Two things are happening. The tts is stuttering and it’s being picked up and processed by the microphone for stt. I suspect we need to use asyncio to serialize the speaking thread and also to serialize the listening and speaking methods. It will take me a while to figure this out. You’re obviously a much better programmer. My background is mostly system and db administration. Cheers!

No, I am no much better programmer, you do have tons of experience than I do, I am just doing my best.

Yeah, I noticed this issue as well, I think there is a problem with sounddevice, (the package I used for getting input and for output as well) but because there is so much threading I didn't dig deeper to see what's wrong, that's why I am checking inside the callback if the play_back event is set before running the llm again.
Also try to use an external good microphone ans dome headsets, it helped with that issue.

from pywhispercpp.

MikeyBeez commented on June 15, 2024

Have you seen this? https://huggingface.co/facebook/seamless-expressive

…

On Jan 10, 2024, at 11:48 PM, Abdeladim Sadiki ***@***.***> wrote: I’ll need to think about this. Two things are happening. The tts is stuttering and it’s being picked up and processed by the microphone for stt. I suspect we need to use asyncio to serialize the speaking thread and also to serialize the listening and speaking methods. It will take me a while to figure this out. You’re obviously a much better programmer. My background is mostly system and db administration. Cheers! No, I am no much better programmer, you do have tons of experience than I do, I am just doing my best. Yeah, I noticed this issue as well, I think there is a problem with sounddevice, (the package I used for getting input and for output as well) but because there is so much threading I didn't dig deeper to see what's wrong, that's why I am checking inside the callback if the play_back event is set before running the llm again. Also try to use an external good microphone ans dome headsets, it helped with that issue. — Reply to this email directly, view it on GitHub <#29 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADM2NQACRYNVOGON727I4ETYN54JFAVCNFSM6AAAAABBP6UNLCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBWGMZTANBSG4>. You are receiving this because you were mentioned.

from pywhispercpp.

abdeladim-s commented on June 15, 2024

No, Is this a new tts model ? How fast is it ? Can it replace gtts ?
I tried Vits and it was good but because running ollama took all my gpu, running inference on cpu takes a long time

from pywhispercpp.

MikeyBeez commented on June 15, 2024

I just applied for it. I’ll need to wait to be accepted. I think you should apply too. So I don’t know what to make of this. It’s an LLM with TTS and STT built in. the model sheet says inference is not available at this time, so it’s a confusing model. I really don’t know what Meta intends with this. Worst case scenario is it might replace whisper. Yann Lecun posted this today.

…

On Jan 13, 2024, at 6:50 PM, Abdeladim Sadiki ***@***.***> wrote: No, Is this a new tts model ? How fast is it ? Can it replace gtts ? I tried Vits and it was good but because running ollama took all my gpu, running inference on cpu takes a long time — Reply to this email directly, view it on GitHub <#29 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADM2NQCGR6HZXOIMENPMOG3YOMTWPAVCNFSM6AAAAABBP6UNLCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJQHAYDSNZWG4>. You are receiving this because you were mentioned.

from pywhispercpp.

abdeladim-s commented on June 15, 2024

Woow that's interesting ..
I don't like this meta policy of hiding the models behind an application, even though I really respect the fact of releasing the weights open source,
But I think I should apply this time,

If you ever get the chance to try it .. let me know how it goes.

from pywhispercpp.

Using the agent for interacting with ollama models about pywhispercpp HOT 29 CLOSED

Comments (29)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs