GithubHelp home page GithubHelp logo

lugia19 / echo-xi Goto Github PK

View Code? Open in Web Editor NEW
27.0 3.0 2.0 172 KB

Speech to text to speech using Elevenlabs

Python 99.62% Batchfile 0.38%
elevenlabs python speech-recognition speech-to-speech tts voice speech speech-synthesis speech-to-text

echo-xi's Introduction

Echo-XI

Buy Me a Coffee at ko-fi.com

Info

I published a tour of all the various features available on youtube, click here to view it.

The main goal of the project is to offer speech to text to speech.

It now has a GUI, and it stores all the settings you input. Sensitive details such as API Keys are stored in the system keyring.

In case you want to use the cli, simply call the script from the comamnd line with the argument --cli.

It offers three separate speech recognition services:

  • Vosk, with recasepunc to add punctuation
  • Azure speech recognition
  • Whisper, both running locally (now using faster-whisper for faster recognition and lower VRAM usage) and through openAI's API

In addition, it automatically translates the output into a language of the user's choosing (from those supported by ElevenLabs' multilingual model), if the user is speaking a different language.

Each speech recognition provider has different language support, so be sure to read the details.

Translation is provided via either DeepL for supported languages, or Google Translate.

The recognized and translated text is then sent to a TTS provider, of which two are supported:

  • Elevenlabs, through the elevenlabslib module, a high quality but paid online TTS service that supports multiple languages.
  • pyttsx3, a low quality TTS that runs locally.

The project also allows you to synchronize the detected text with an OBS text source using obsws-python.

Installation and usage

Warning: Python 3.11 is still not fully supported by pytorch (but it should work on the nightly build). I'd recommend using python 3.10.6

Before anything else: you'll need to have ffmpeg in your $PATH. You can follow this tutorial if you're on windows

Additionally, if you're on linux, you'll need to make sure portaudio is installed.

On windows:

  1. Clone the repo: git clone https://github.com/lugia19/Echo-XI.git

  2. Run run.bat - it will handle all the following steps for you.

Everywhere else:

  1. Clone the repo: git clone https://github.com/lugia19/Echo-XI.git

  2. Create a venv: python -m venv venv

  3. Activate the venv: venv\Scripts\activate

  4. If you did it correctly, there should be (venv) at the start of the command line.

  5. Install the requirements: pip install -r requirements.txt

  6. Run it.

If you would like to use the voice on something like discord, use VB-Cable. In the script select your normal microphone as input, VB-Cable input as the output, then on discord select VB-Cable output as the input. Yes, it's a little confusing.

Notes on vosk/recasepunc

If you're looking to use the vosk/recasepunc and you need something besides the included (downloadable) models, read on.

Vosk models can be found here. The same page also offers some recasepunc models. For additional ones, you can look in the recasepunc repo.

For english I use vosk-model-en-us-0.22 and vosk-recasepunc-en-0.22. Recasepunc is technically optional when using vosk, but highly recommended to improve the output.

The script looks for models under the models/vosk and models/recasepunc folders.

A typical folder structure would look something like this (recasepunc models can either be in their own folder or by themselves, depending on which source you download them from. Both are supported.):

-misc
-models
    -vosk
        -vosk-model-en-us-0.22
        -vosk-model-it-0.22
    -recasepunc
        -vosk-recasepunc-en-0.22
        it.22000
-speechRecognition
-ttsProviders
helper.py
speechToSpeech.py

For everything else, simply run the script and follow the instructions.

If you would like to use the voice on something like discord, use VB-Cable. In the script select your normal microphone as input, VB-Cable input as the output, then on discord select VB-Cable output as the input. Yes, it's a little confusing.

echo-xi's People

Contributors

lugia19 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

samirpaul1 sp58

echo-xi's Issues

Crash when using ElevenLabs & Recasepunc/Vosk

Traceback:

File "C:\Users\gcpins\Documents\Speech2Speech\speechToSpeech.py", line 117, in <module> main() File "C:\Users\gcpins\Documents\Speech2Speech\speechToSpeech.py", line 37, in main srProvider.recognize_loop() File "C:\Users\gcpins\Documents\Speech2Speech\speechRecognition\VoskProvider.py", line 93, in recognize_loop process_text(recognizedText, self.chosenLanguage) File "C:\Users\gcpins\Documents\Speech2Speech\speechToSpeech.py", line 49, in process_text helper.ttsProvider.synthesizeAndPlayAudio(translatedText, helper.chosenOutput) File "C:\Users\gcpins\Documents\Speech2Speech\ttsProviders\ElevenlabsProvider.py", line 56, in synthesizeAndPlayAudio self.ttsVoice.generate_and_stream_audio(prompt, outputDeviceIndex, AttributeError: 'ElevenLabsVoice' object has no attribute 'generate_and_stream_audio'

Windows 11
Using Python 3.10.0

Crashes after transcribing speech (presumably when sending transcribed text to 11Labs API).

Syntax error for Windows 10

It might just be my setup but the helper.py file errored out for my (Windows10) I had to change lines:

#27 defaultConfig: dict[str, str | int| list|float] = {
#167 def get_provider_config(provider: SpeechRecProvider | TTSProvider) -> dict[str, str|float|bool|int|list]:
#185 def update_provider_config(provider: SpeechRecProvider | TTSProvider, providerConfig:dict):

To
defaultConfig: Dict[str, Union[str, int, list, float]] = {
def get_provider_config(provider: Union[SpeechRecProvider, TTSProvider]) -> Dict[str, Union[str, float, bool, int, list]]:
def update_provider_config(provider: Union[SpeechRecProvider, TTSProvider], providerConfig: dict):

respectively and also
from typing import Dict, Union, Mapping

Something about in Python, the | is used for bitwise OR operations, not for defining the type of a value in a dictionary

Now it works like a charm! Great Job!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.