GithubHelp home page GithubHelp logo

promtengineer / verbi Goto Github PK

View Code? Open in Web Editor NEW
142.0 3.0 38.0 305 KB

A modular voice assistant application for experimenting with state-of-the-art transcription, response generation, and text-to-speech models. Supports OpenAI, Groq, Elevanlabs, CartesiaAI, and Deepgram APIs, plus local models via Ollama. Ideal for research and development in voice technology.

License: MIT License

Python 100.00%

verbi's Introduction

VERBI - Voice Assistant πŸŽ™οΈ

GitHub Stars GitHub Forks GitHub Issues GitHub Pull Requests License

Motivation ✨✨✨

Welcome to the Voice Assistant project! πŸŽ™οΈ Our goal is to create a modular voice assistant application that allows you to experiment with state-of-the-art (SOTA) models for various components. The modular structure provides flexibility, enabling you to pick and choose between different SOTA models for transcription, response generation, and text-to-speech (TTS). This approach facilitates easy testing and comparison of different models, making it an ideal platform for research and development in voice assistant technologies. Whether you're a developer, researcher, or enthusiast, this project is for you!

Features 🧰

  • Modular Design: Easily switch between different models for transcription, response generation, and TTS.
  • Support for Multiple APIs: Integrates with OpenAI, Groq, and Deepgram APIs, along with placeholders for local models.
  • Audio Recording and Playback: Record audio from the microphone and play generated speech.
  • Configuration Management: Centralized configuration in config.py for easy setup and management.

Project Structure πŸ“‚

voice_assistant/
β”œβ”€β”€ voice_assistant/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ audio.py
β”‚   β”œβ”€β”€ api_key_manager.py
β”‚   β”œβ”€β”€ config.py
β”‚   β”œβ”€β”€ transcription.py
β”‚   β”œβ”€β”€ response_generation.py
β”‚   β”œβ”€β”€ text_to_speech.py
β”‚   β”œβ”€β”€ utils.py
β”‚   β”œβ”€β”€ local_tts_api.py
β”‚   β”œβ”€β”€ local_tts_generation.py
β”œβ”€β”€ .env
β”œβ”€β”€ run_voice_assistant.py
β”œβ”€β”€ setup.py
β”œβ”€β”€ requirements.txt
└── README.md

Setup Instructions πŸ“‹

Prerequisites βœ…

  • Python 3.10 or higher
  • Virtual environment (recommended)

Step-by-Step Instructions πŸ”’

  1. πŸ“₯ Clone the repository
   git clone https://github.com/PromtEngineer/Verbi.git
   cd Verbi
  1. 🐍 Set up a virtual environment

Using venv:

    python -m venv venv
    source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Using conda:

    conda create --name verbi python=3.10
    conda activate verbi
  1. πŸ“¦ Install the required packages
   pip install -r requirements.txt
  1. πŸ› οΈ Set up the environment variables

Create a .env file in the root directory and add your API keys:

    OPENAI_API_KEY=your_openai_api_key
    GROQ_API_KEY=your_groq_api_key
    DEEPGRAM_API_KEY=your_deepgram_api_key
    LOCAL_MODEL_PATH=path/to/local/model
  1. 🧩 Configure the models

Edit config.py to select the models you want to use:

    class Config:
        # Model selection
        TRANSCRIPTION_MODEL = 'groq'  # Options: 'openai', 'groq', 'deepgram', 'fastwhisperapi' 'local'
        RESPONSE_MODEL = 'groq'       # Options: 'openai', 'groq', 'ollama', 'local'
        TTS_MODEL = 'deepgram'        # Options: 'openai', 'deepgram', 'elevenlabs', 'local', 'melotts'

        # API keys and paths
        OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
        GROQ_API_KEY = os.getenv("GROQ_API_KEY")
        DEEPGRAM_API_KEY = os.getenv("DEEPGRAM_API_KEY")
        LOCAL_MODEL_PATH = os.getenv("LOCAL_MODEL_PATH")

If you are running LLM locally via Ollama, make sure the Ollama server is runnig before starting verbi.

  1. πŸ”Š Configure ElevenLabs Jarvis' Voice
  • Voice samples here.
  • Follow this link to add the Jarvis voice to your ElevenLabs account.
  • Name the voice 'Paul J.' or, if you prefer a different name, ensure it matches the ELEVENLABS_VOICE_ID variable in the text_to_speech.py file.
  1. πŸƒ Run the voice assistant
   python run_voice_assistant.py
  1. 🎀 Install FastWhisperAPI

    Optional step if you need a local transcription model

    Clone the repository

       cd..
       git clone https://github.com/3choff/FastWhisperAPI.git
       cd FastWhisperAPI

    Install the required packages:

       pip install -r requirements.txt

    Run the API

       fastapi run main.py

    Alternative Setup and Run Methods

    The API can also run directly on a Docker container or in Google Colab.

    Docker:

    Build a Docker container:

       docker build -t fastwhisperapi .

    Run the container

       docker run -p 8000:8000 fastwhisperapi

    Refer to the repository documentation for the Google Colab method: https://github.com/3choff/FastWhisperAPI/blob/main/README.md

  2. 🎀 Install Local TTS - MeloTTS

    Optional step if you need a local Text to Speech model

    Install MeloTTS from Github

    Use the following link to install MeloTTS for your operating system.

    Once the package is installed on your local virtual environment, you can start the api server using the following command.

       python voice_assistant/local_tts_api.py

    The local_tts_api.py file implements as fastapi server that will listen to incoming text and will generate audio using MeloTTS model. In order to use the local TTS model, you will need to update the config.py file by setting:

       TTS_MODEL = 'melotts'        # Options: 'openai', 'deepgram', 'elevenlabs', 'local', 'melotts'

    You can run the main file to start using verbi with local models.

Model Options βš™οΈ

Transcription Models 🎀

  • OpenAI: Uses OpenAI's Whisper model.
  • Groq: Uses Groq's Whisper-large-v3 model.
  • Deepgram: Uses Deepgram's transcription model.
  • FastWhisperAPI: Uses FastWhisperAPI, a local transcription API powered by Faster Whisper.
  • Local: Placeholder for a local speech-to-text (STT) model.

Response Generation Models πŸ’¬

  • OpenAI: Uses OpenAI's GPT-4 model.
  • Groq: Uses Groq's LLaMA model.
  • Ollama: Uses any model served via Ollama.
  • Local: Placeholder for a local language model.

Text-to-Speech (TTS) Models πŸ”Š

  • OpenAI: Uses OpenAI's TTS model with the 'fable' voice.
  • Deepgram: Uses Deepgram's TTS model with the 'aura-angus-en' voice.
  • ElevenLabs: Uses ElevenLabs' TTS model with the 'Paul J.' voice.
  • Local: Placeholder for a local TTS model.

Detailed Module Descriptions πŸ“˜

  • run_verbi.py: Main script to run the voice assistant.
  • voice_assistant/config.py: Manages configuration settings and API keys.
  • voice_assistant/api_key_manager.py: Handles retrieval of API keys based on configured models.
  • voice_assistant/audio.py: Functions for recording and playing audio.
  • voice_assistant/transcription.py: Manages audio transcription using various APIs.
  • voice_assistant/response_generation.py: Handles generating responses using various language models.
  • voice_assistant/text_to_speech.py: Manages converting text responses into speech.
  • voice_assistant/utils.py: Contains utility functions like deleting files.
  • voice_assistant/local_tts_api.py: Contains the api implementation to run the MeloTTS model.
  • voice_assistant/local_tts_generation.py: Contains the code to use the MeloTTS api to generated audio.
  • voice_assistant/__init__.py: Initializes the voice_assistant package.

Roadmap πŸ›€οΈπŸ›€οΈπŸ›€οΈ

Here's what's next for the Voice Assistant project:

  1. Add Support for Streaming: Enable real-time streaming of audio input and output.
  2. Add Support for ElevenLabs and Enhanced Deepgram for TTS: Integrate additional TTS options for higher quality and variety.
  3. Add Filler Audios: Include background or filler audios while waiting for model responses to enhance user experience.
  4. Add Support for Local Models Across the Board: Expand support for local models in transcription, response generation, and TTS.

Contributing 🀝

We welcome contributions from the community! If you'd like to help improve this project, please follow these steps:

  1. Fork the repository.
  2. Create a new branch (git checkout -b feature-branch).
  3. Make your changes and commit them (git commit -m 'Add new feature').
  4. Push to the branch (git push origin feature-branch).
  5. Open a pull request detailing your changes.

Star History ✨✨✨

Star History Chart

verbi's People

Contributors

promtengineer avatar 3choff avatar

Stargazers

 avatar  avatar Venky B avatar David Tapang avatar  avatar Chris avatar Roberto Gogoni avatar Burt avatar  avatar  avatar Joey Xiong avatar MD SAMI avatar  avatar  avatar  avatar  avatar Giusti Ro. avatar katendejericho avatar Everyjams avatar  avatar xiix_ai avatar Agustin Rivera avatar Technetium1 avatar  avatar Robin Orheden avatar Vinay avatar Fernando Fernandez avatar  avatar  avatar Tim Asp avatar Ryan McKinney avatar Massimo avatar Jeff McJunkin avatar  avatar  avatar Ryan K avatar Albert Atkinson avatar Jinhui.Lin avatar vcapp avatar Mukund Kalra avatar 千叀兴云ηŸ₯葑权 avatar Jaya avatar  avatar Will Zheng avatar zhangkejiang avatar Florin Nedelcu avatar wen avatar Darwin avatar Raja avatar  avatar  avatar  avatar  avatar  avatar  avatar Theo Sun avatar tomato avatar 爱可可-ηˆ±η”Ÿζ΄» avatar Artur Daveyan avatar  avatar  avatar  avatar XHorizont.com avatar  avatar  avatar Krtolica Vujadin avatar SirRa1zel avatar Shashank avatar Leon van Bokhorst avatar  avatar  avatar Sudo Chia avatar  avatar M1nd 3xpand3r avatar Matthew Cochran avatar Paulo Henrique Dias Costa avatar  avatar Dylan Beadle avatar  avatar Randolphjand avatar Dror Hilman PhD avatar bigsk1 avatar Mohamed Elmardi avatar  avatar Rodolfo Castanheira avatar Szekely Attila avatar Samuel Shemtov avatar Guile Lindroth avatar  avatar  avatar Hitlab Studios avatar Bhomit Bhandari avatar Niall Taylor avatar Crash Angel Arts avatar ScottzCodez avatar Matthias Chin avatar  avatar  avatar  avatar Aliaksei Zelianouski avatar

Watchers

Kostas Georgiou avatar Guile Lindroth avatar  avatar

verbi's Issues

MelloTTS /generate-audio/ 404 Not Found

MeloTTS running locally from docker, WebUI generates audio files, but the /generate-audio endpoint called by Verbi to MeloTTS results in a 404.

I've looked through the MeloTTS project and can't see any references to /generate-audio

$ python3 run_voice_assistant.py
pygame 2.5.2 (SDL 2.28.3, Python 3.12.3)
Hello from the pygame community. https://www.pygame.org/contribute.html
2024-06-06 15:00:08,387 - INFO - Calibrating for ambient noise...
2024-06-06 15:00:09,369 - INFO - Recording started
2024-06-06 15:00:12,251 - INFO - Recording complete
2024-06-06 15:00:12,917 - INFO - No transcription was returned. Starting recording again.
2024-06-06 15:00:13,094 - INFO - Calibrating for ambient noise...
2024-06-06 15:00:14,079 - INFO - Recording started
2024-06-06 15:00:19,776 - INFO - Recording complete
2024-06-06 15:00:21,105 - INFO - You said: Hello Prompt Engineer, this is a recording, but for some reason, there's no response.
2024-06-06 15:00:23,435 - INFO - HTTP Request: POST http://127.0.0.1:11434/api/chat "HTTP/1.1 200 OK"
2024-06-06 15:00:23,435 - INFO - Response: Greetings! I'm here to assist you. Is there anything I can help you with today?
2024-06-06 15:00:23,439 - ERROR - Failed to convert text to speech: 404 Client Error: Not Found for url: http://localhost:8888/generate-audio/
2024-06-06 15:00:23,491 - ERROR - Failed to play audio: No file 'output.mp3' found in working directory '/Users/dev/Verbi'.
2024-06-06 15:00:23,667 - INFO - Calibrating for ambient noise...
2024-06-06 15:00:24,646 - INFO - Recording started

Groq API does not work.

No idea where it gets that URL-path, but it's not correct. I can't find it in the code.

INFO - HTTP Request: POST https://api.groq.com/openai/v1/openai/v1/chat/completions "HTTP/1.1 404 Not Found"
ERROR - Failed to generate response: Error code: 404 - {'error': {'message': 'Unknown request URL: POST /openai/v1/openai/v1/chat/completions. Please check the URL for typos, or see the docs at https://console.groq.com/docs/', 'type': 'invalid_request_error', 'code': 'unknown_url'}}
INFO - Response: Error in generating response
  1. The Groq-Module is being used, so it does not need to use the alternative URL for OpenA-Compatibility.
  2. And even if, the URL should be https://api.groq.com/openai/v1.

Documentation:
https://console.groq.com/docs/openai

Something is fishy here.

get error

Failed to record audio: [WinError 2] The system cannot find the file specified

(install error) ERROR: Could not build wheels for PyAudio, which is required to install pyproject.toml-based projects

OS: MacOS 14.5 Sonoma
Hardware: Intel
Python: 3.12.3
shell: bash

This error appears after the pip3 install -r requirements.txt step.

The solution was to start over but beforehand install portaudio via homebrew: brew install portaudio

...
Downloading shellingham-1.5.4-py2.py3-none-any.whl (9.8 kB)
Downloading iniconfig-2.0.0-py3-none-any.whl (5.9 kB)
Downloading pure_eval-0.2.2-py3-none-any.whl (11 kB)
Downloading wcwidth-0.2.13-py2.py3-none-any.whl (34 kB)
Downloading markdown_it_py-3.0.0-py3-none-any.whl (87 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 87.5/87.5 kB 2.7 MB/s eta 0:00:00
Downloading six-1.16.0-py2.py3-none-any.whl (11 kB)
Downloading mdurl-0.1.2-py3-none-any.whl (10.0 kB)
Building wheels for collected packages: PyAudio
  Building wheel for PyAudio (pyproject.toml) ... error
  error: subprocess-exited-with-error

  Γ— Building wheel for PyAudio (pyproject.toml) did not run successfully.
  β”‚ exit code: 1
  ╰─> [18 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.macosx-13.0-x86_64-cpython-312
      creating build/lib.macosx-13.0-x86_64-cpython-312/pyaudio
      copying src/pyaudio/__init__.py -> build/lib.macosx-13.0-x86_64-cpython-312/pyaudio
      running build_ext
      building 'pyaudio._portaudio' extension
      creating build/temp.macosx-13.0-x86_64-cpython-312
      creating build/temp.macosx-13.0-x86_64-cpython-312/src
      creating build/temp.macosx-13.0-x86_64-cpython-312/src/pyaudio
      clang -fno-strict-overflow -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk -DMACOS=1 -I/usr/local/include -I/usr/include -I/opt/homebrew/include -I/Users/phil/dev/Verbi/venv/include -I/usr/local/opt/[email protected]/Frameworks/Python.framework/Versions/3.12/include/python3.12 -c src/pyaudio/device_api.c -o build/temp.macosx-13.0-x86_64-cpython-312/src/pyaudio/device_api.o
      src/pyaudio/device_api.c:9:10: fatal error: 'portaudio.h' file not found
      #include "portaudio.h"
               ^~~~~~~~~~~~~
      1 error generated.
      error: command '/usr/bin/clang' failed with exit code 1
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for PyAudio
Failed to build PyAudio
ERROR: Could not build wheels for PyAudio, which is required to install pyproject.toml-based projects
(venv)

Errors on "TRANSCRIPTION_MODEL"

With TRANSCRIPTION_MODEL = 'deepgram' I am getting:

pygame 2.5.2 (SDL 2.28.3, Python 3.11.6) Hello from the pygame community. https://www.pygame.org/contribute.html 2024-05-20 14:38:03,651 - INFO - Recording started 2024-05-20 14:38:09,461 - INFO - Recording complete 2024-05-20 14:38:09,592 - ERROR - An error occurred: can only concatenate str (not "NoneType") to str 2024-05-20 14:38:09,592 - INFO - Deleted file: test.wav 2024-05-20 14:38:10,733 - INFO - Recording started 2024-05-20 14:38:16,905 - INFO - Recording complete 2024-05-20 14:38:17,038 - ERROR - An error occurred: can only concatenate str (not "NoneType") to str 2024-05-20 14:38:17,039 - INFO - Deleted file: test.wav 2024-05-20 14:38:18,176 - INFO - Recording started

ANd with TRANSCRIPTION_MODEL = 'groq'

pygame 2.5.2 (SDL 2.28.3, Python 3.11.6)
Hello from the pygame community. https://www.pygame.org/contribute.html
2024-05-20 14:37:08,207 - INFO - Recording started
2024-05-20 14:37:13,265 - INFO - Recording complete
2024-05-20 14:37:13,432 - ERROR - Failed to transcribe audio: 'Groq' object has no attribute 'audio'
2024-05-20 14:37:13,432 - INFO - You said: Error in transcribing audio
2024-05-20 14:37:13,997 - INFO - HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
2024-05-20 14:37:14,006 - INFO - Response: Sorry to hear that! If you're experiencing issues transcribing audio, can you please provide more context or details about the error, such as:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.