promtengineer / verbi Goto Github PK

A modular voice assistant application for experimenting with state-of-the-art transcription, response generation, and text-to-speech models. Supports OpenAI, Groq, Elevanlabs, CartesiaAI, and Deepgram APIs, plus local models via Ollama. Ideal for research and development in voice technology.

License: MIT License

Python 100.00%

verbi's Introduction

VERBI - Voice Assistant 🎙️

Motivation ✨✨✨

Welcome to the Voice Assistant project! 🎙️ Our goal is to create a modular voice assistant application that allows you to experiment with state-of-the-art (SOTA) models for various components. The modular structure provides flexibility, enabling you to pick and choose between different SOTA models for transcription, response generation, and text-to-speech (TTS). This approach facilitates easy testing and comparison of different models, making it an ideal platform for research and development in voice assistant technologies. Whether you're a developer, researcher, or enthusiast, this project is for you!

Features 🧰

Modular Design: Easily switch between different models for transcription, response generation, and TTS.
Support for Multiple APIs: Integrates with OpenAI, Groq, and Deepgram APIs, along with placeholders for local models.
Audio Recording and Playback: Record audio from the microphone and play generated speech.
Configuration Management: Centralized configuration in config.py for easy setup and management.

Project Structure 📂

voice_assistant/
├── voice_assistant/
│   ├── __init__.py
│   ├── audio.py
│   ├── api_key_manager.py
│   ├── config.py
│   ├── transcription.py
│   ├── response_generation.py
│   ├── text_to_speech.py
│   ├── utils.py
│   ├── local_tts_api.py
│   ├── local_tts_generation.py
├── .env
├── run_voice_assistant.py
├── setup.py
├── requirements.txt
└── README.md

Setup Instructions 📋

Prerequisites ✅

Python 3.10 or higher
Virtual environment (recommended)

Step-by-Step Instructions 🔢

📥 Clone the repository

   git clone https://github.com/PromtEngineer/Verbi.git
   cd Verbi

🐍 Set up a virtual environment

Using venv:

    python -m venv venv
    source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Using conda:

    conda create --name verbi python=3.10
    conda activate verbi

📦 Install the required packages

   pip install -r requirements.txt

🛠️ Set up the environment variables

Create a .env file in the root directory and add your API keys:

    OPENAI_API_KEY=your_openai_api_key
    GROQ_API_KEY=your_groq_api_key
    DEEPGRAM_API_KEY=your_deepgram_api_key
    LOCAL_MODEL_PATH=path/to/local/model

🧩 Configure the models

Edit config.py to select the models you want to use:

    class Config:
        # Model selection
        TRANSCRIPTION_MODEL = 'groq'  # Options: 'openai', 'groq', 'deepgram', 'fastwhisperapi' 'local'
        RESPONSE_MODEL = 'groq'       # Options: 'openai', 'groq', 'ollama', 'local'
        TTS_MODEL = 'deepgram'        # Options: 'openai', 'deepgram', 'elevenlabs', 'local', 'melotts'

        # API keys and paths
        OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
        GROQ_API_KEY = os.getenv("GROQ_API_KEY")
        DEEPGRAM_API_KEY = os.getenv("DEEPGRAM_API_KEY")
        LOCAL_MODEL_PATH = os.getenv("LOCAL_MODEL_PATH")

If you are running LLM locally via Ollama, make sure the Ollama server is runnig before starting verbi.

🔊 Configure ElevenLabs Jarvis' Voice

Voice samples here.
Follow this link to add the Jarvis voice to your ElevenLabs account.
Name the voice 'Paul J.' or, if you prefer a different name, ensure it matches the ELEVENLABS_VOICE_ID variable in the text_to_speech.py file.

🏃 Run the voice assistant

   python run_voice_assistant.py

🎤 Install FastWhisperAPI

Optional step if you need a local transcription model

Clone the repository
```
   cd..
   git clone https://github.com/3choff/FastWhisperAPI.git
   cd FastWhisperAPI
```
Install the required packages:
```
   pip install -r requirements.txt
```
Run the API
```
   fastapi run main.py
```
Alternative Setup and Run Methods

The API can also run directly on a Docker container or in Google Colab.

Docker:

Build a Docker container:
```
   docker build -t fastwhisperapi .
```
Run the container
```
   docker run -p 8000:8000 fastwhisperapi
```
Refer to the repository documentation for the Google Colab method: https://github.com/3choff/FastWhisperAPI/blob/main/README.md
🎤 Install Local TTS - MeloTTS

Optional step if you need a local Text to Speech model

Install MeloTTS from Github

Use the following link to install MeloTTS for your operating system.

Once the package is installed on your local virtual environment, you can start the api server using the following command.
```
   python voice_assistant/local_tts_api.py
```
The local_tts_api.py file implements as fastapi server that will listen to incoming text and will generate audio using MeloTTS model. In order to use the local TTS model, you will need to update the config.py file by setting:
```
   TTS_MODEL = 'melotts'        # Options: 'openai', 'deepgram', 'elevenlabs', 'local', 'melotts'
```
You can run the main file to start using verbi with local models.

Model Options ⚙️

Transcription Models 🎤

OpenAI: Uses OpenAI's Whisper model.
Groq: Uses Groq's Whisper-large-v3 model.
Deepgram: Uses Deepgram's transcription model.
FastWhisperAPI: Uses FastWhisperAPI, a local transcription API powered by Faster Whisper.
Local: Placeholder for a local speech-to-text (STT) model.

Response Generation Models 💬

OpenAI: Uses OpenAI's GPT-4 model.
Groq: Uses Groq's LLaMA model.
Ollama: Uses any model served via Ollama.
Local: Placeholder for a local language model.

Text-to-Speech (TTS) Models 🔊

OpenAI: Uses OpenAI's TTS model with the 'fable' voice.
Deepgram: Uses Deepgram's TTS model with the 'aura-angus-en' voice.
ElevenLabs: Uses ElevenLabs' TTS model with the 'Paul J.' voice.
Local: Placeholder for a local TTS model.

Detailed Module Descriptions 📘

run_verbi.py: Main script to run the voice assistant.
voice_assistant/config.py: Manages configuration settings and API keys.
voice_assistant/api_key_manager.py: Handles retrieval of API keys based on configured models.
voice_assistant/audio.py: Functions for recording and playing audio.
voice_assistant/transcription.py: Manages audio transcription using various APIs.
voice_assistant/response_generation.py: Handles generating responses using various language models.
voice_assistant/text_to_speech.py: Manages converting text responses into speech.
voice_assistant/utils.py: Contains utility functions like deleting files.
voice_assistant/local_tts_api.py: Contains the api implementation to run the MeloTTS model.
voice_assistant/local_tts_generation.py: Contains the code to use the MeloTTS api to generated audio.
voice_assistant/__init__.py: Initializes the voice_assistant package.

Roadmap 🛤️🛤️🛤️

Here's what's next for the Voice Assistant project:

Add Support for Streaming: Enable real-time streaming of audio input and output.
Add Support for ElevenLabs and Enhanced Deepgram for TTS: Integrate additional TTS options for higher quality and variety.
Add Filler Audios: Include background or filler audios while waiting for model responses to enhance user experience.
Add Support for Local Models Across the Board: Expand support for local models in transcription, response generation, and TTS.

Contributing 🤝

We welcome contributions from the community! If you'd like to help improve this project, please follow these steps:

Fork the repository.
Create a new branch (git checkout -b feature-branch).
Make your changes and commit them (git commit -m 'Add new feature').
Push to the branch (git push origin feature-branch).
Open a pull request detailing your changes.

Star History ✨✨✨

verbi's People

Contributors

Stargazers

Watchers

verbi's Issues

MelloTTS /generate-audio/ 404 Not Found

MeloTTS running locally from docker, WebUI generates audio files, but the /generate-audio endpoint called by Verbi to MeloTTS results in a 404.

I've looked through the MeloTTS project and can't see any references to /generate-audio

$ python3 run_voice_assistant.py
pygame 2.5.2 (SDL 2.28.3, Python 3.12.3)
Hello from the pygame community. https://www.pygame.org/contribute.html
2024-06-06 15:00:08,387 - INFO - Calibrating for ambient noise...
2024-06-06 15:00:09,369 - INFO - Recording started
2024-06-06 15:00:12,251 - INFO - Recording complete
2024-06-06 15:00:12,917 - INFO - No transcription was returned. Starting recording again.
2024-06-06 15:00:13,094 - INFO - Calibrating for ambient noise...
2024-06-06 15:00:14,079 - INFO - Recording started
2024-06-06 15:00:19,776 - INFO - Recording complete
2024-06-06 15:00:21,105 - INFO - You said: Hello Prompt Engineer, this is a recording, but for some reason, there's no response.
2024-06-06 15:00:23,435 - INFO - HTTP Request: POST http://127.0.0.1:11434/api/chat "HTTP/1.1 200 OK"
2024-06-06 15:00:23,435 - INFO - Response: Greetings! I'm here to assist you. Is there anything I can help you with today?
2024-06-06 15:00:23,439 - ERROR - Failed to convert text to speech: 404 Client Error: Not Found for url: http://localhost:8888/generate-audio/
2024-06-06 15:00:23,491 - ERROR - Failed to play audio: No file 'output.mp3' found in working directory '/Users/dev/Verbi'.
2024-06-06 15:00:23,667 - INFO - Calibrating for ambient noise...
2024-06-06 15:00:24,646 - INFO - Recording started

Add Azure OpenAI

How can I add Azure OpenAI credencials?

Groq API does not work.

No idea where it gets that URL-path, but it's not correct. I can't find it in the code.

INFO - HTTP Request: POST https://api.groq.com/openai/v1/openai/v1/chat/completions "HTTP/1.1 404 Not Found"
ERROR - Failed to generate response: Error code: 404 - {'error': {'message': 'Unknown request URL: POST /openai/v1/openai/v1/chat/completions. Please check the URL for typos, or see the docs at https://console.groq.com/docs/', 'type': 'invalid_request_error', 'code': 'unknown_url'}}
INFO - Response: Error in generating response

The Groq-Module is being used, so it does not need to use the alternative URL for OpenA-Compatibility.
And even if, the URL should be https://api.groq.com/openai/v1.

Documentation:
https://console.groq.com/docs/openai

Something is fishy here.

get error

Failed to record audio: [WinError 2] The system cannot find the file specified

(install error) ERROR: Could not build wheels for PyAudio, which is required to install pyproject.toml-based projects

OS: MacOS 14.5 Sonoma
Hardware: Intel
Python: 3.12.3
shell: bash

This error appears after the pip3 install -r requirements.txt step.

The solution was to start over but beforehand install portaudio via homebrew: brew install portaudio

...
Downloading shellingham-1.5.4-py2.py3-none-any.whl (9.8 kB)
Downloading iniconfig-2.0.0-py3-none-any.whl (5.9 kB)
Downloading pure_eval-0.2.2-py3-none-any.whl (11 kB)
Downloading wcwidth-0.2.13-py2.py3-none-any.whl (34 kB)
Downloading markdown_it_py-3.0.0-py3-none-any.whl (87 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 87.5/87.5 kB 2.7 MB/s eta 0:00:00
Downloading six-1.16.0-py2.py3-none-any.whl (11 kB)
Downloading mdurl-0.1.2-py3-none-any.whl (10.0 kB)
Building wheels for collected packages: PyAudio
  Building wheel for PyAudio (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building wheel for PyAudio (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [18 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.macosx-13.0-x86_64-cpython-312
      creating build/lib.macosx-13.0-x86_64-cpython-312/pyaudio
      copying src/pyaudio/__init__.py -> build/lib.macosx-13.0-x86_64-cpython-312/pyaudio
      running build_ext
      building 'pyaudio._portaudio' extension
      creating build/temp.macosx-13.0-x86_64-cpython-312
      creating build/temp.macosx-13.0-x86_64-cpython-312/src
      creating build/temp.macosx-13.0-x86_64-cpython-312/src/pyaudio
      clang -fno-strict-overflow -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk -DMACOS=1 -I/usr/local/include -I/usr/include -I/opt/homebrew/include -I/Users/phil/dev/Verbi/venv/include -I/usr/local/opt/[email protected]/Frameworks/Python.framework/Versions/3.12/include/python3.12 -c src/pyaudio/device_api.c -o build/temp.macosx-13.0-x86_64-cpython-312/src/pyaudio/device_api.o
      src/pyaudio/device_api.c:9:10: fatal error: 'portaudio.h' file not found
      #include "portaudio.h"
               ^~~~~~~~~~~~~
      1 error generated.
      error: command '/usr/bin/clang' failed with exit code 1
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for PyAudio
Failed to build PyAudio
ERROR: Could not build wheels for PyAudio, which is required to install pyproject.toml-based projects
(venv)

Errors on "TRANSCRIPTION_MODEL"

With TRANSCRIPTION_MODEL = 'deepgram' I am getting:

pygame 2.5.2 (SDL 2.28.3, Python 3.11.6) Hello from the pygame community. https://www.pygame.org/contribute.html 2024-05-20 14:38:03,651 - INFO - Recording started 2024-05-20 14:38:09,461 - INFO - Recording complete 2024-05-20 14:38:09,592 - ERROR - An error occurred: can only concatenate str (not "NoneType") to str 2024-05-20 14:38:09,592 - INFO - Deleted file: test.wav 2024-05-20 14:38:10,733 - INFO - Recording started 2024-05-20 14:38:16,905 - INFO - Recording complete 2024-05-20 14:38:17,038 - ERROR - An error occurred: can only concatenate str (not "NoneType") to str 2024-05-20 14:38:17,039 - INFO - Deleted file: test.wav 2024-05-20 14:38:18,176 - INFO - Recording started

ANd with TRANSCRIPTION_MODEL = 'groq'

pygame 2.5.2 (SDL 2.28.3, Python 3.11.6)
Hello from the pygame community. https://www.pygame.org/contribute.html
2024-05-20 14:37:08,207 - INFO - Recording started
2024-05-20 14:37:13,265 - INFO - Recording complete
2024-05-20 14:37:13,432 - ERROR - Failed to transcribe audio: 'Groq' object has no attribute 'audio'
2024-05-20 14:37:13,432 - INFO - You said: Error in transcribing audio
2024-05-20 14:37:13,997 - INFO - HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
2024-05-20 14:37:14,006 - INFO - Response: Sorry to hear that! If you're experiencing issues transcribing audio, can you please provide more context or details about the error, such as:

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.