echo-xi's Introduction

Echo-XI

Info

I published a tour of all the various features available on youtube, click here to view it.

The main goal of the project is to offer speech to text to speech.

It now has a GUI, and it stores all the settings you input. Sensitive details such as API Keys are stored in the system keyring.

In case you want to use the cli, simply call the script from the comamnd line with the argument --cli.

It offers three separate speech recognition services:

Vosk, with recasepunc to add punctuation
Azure speech recognition
Whisper, both running locally (now using faster-whisper for faster recognition and lower VRAM usage) and through openAI's API

In addition, it automatically translates the output into a language of the user's choosing (from those supported by ElevenLabs' multilingual model), if the user is speaking a different language.

Each speech recognition provider has different language support, so be sure to read the details.

Translation is provided via either DeepL for supported languages, or Google Translate.

The recognized and translated text is then sent to a TTS provider, of which two are supported:

Elevenlabs, through the elevenlabslib module, a high quality but paid online TTS service that supports multiple languages.
pyttsx3, a low quality TTS that runs locally.

The project also allows you to synchronize the detected text with an OBS text source using obsws-python.

Installation and usage

Warning: Python 3.11 is still not fully supported by pytorch (but it should work on the nightly build). I'd recommend using python 3.10.6

Before anything else: you'll need to have ffmpeg in your $PATH. You can follow this tutorial if you're on windows

Additionally, if you're on linux, you'll need to make sure portaudio is installed.

On windows:

Clone the repo: git clone https://github.com/lugia19/Echo-XI.git
Run run.bat - it will handle all the following steps for you.

Everywhere else:

Clone the repo: git clone https://github.com/lugia19/Echo-XI.git
Create a venv: python -m venv venv
Activate the venv: venv\Scripts\activate
If you did it correctly, there should be (venv) at the start of the command line.
Install the requirements: pip install -r requirements.txt
Run it.

If you would like to use the voice on something like discord, use VB-Cable. In the script select your normal microphone as input, VB-Cable input as the output, then on discord select VB-Cable output as the input. Yes, it's a little confusing.

Notes on vosk/recasepunc

If you're looking to use the vosk/recasepunc and you need something besides the included (downloadable) models, read on.

Vosk models can be found here. The same page also offers some recasepunc models. For additional ones, you can look in the recasepunc repo.

For english I use vosk-model-en-us-0.22 and vosk-recasepunc-en-0.22. Recasepunc is technically optional when using vosk, but highly recommended to improve the output.

The script looks for models under the models/vosk and models/recasepunc folders.

A typical folder structure would look something like this (recasepunc models can either be in their own folder or by themselves, depending on which source you download them from. Both are supported.):

-misc
-models
    -vosk
        -vosk-model-en-us-0.22
        -vosk-model-it-0.22
    -recasepunc
        -vosk-recasepunc-en-0.22
        it.22000
-speechRecognition
-ttsProviders
helper.py
speechToSpeech.py

For everything else, simply run the script and follow the instructions.

echo-xi's People

Contributors

Stargazers

Watchers

echo-xi's Issues

Crash when using ElevenLabs & Recasepunc/Vosk

Traceback:

File "C:\Users\gcpins\Documents\Speech2Speech\speechToSpeech.py", line 117, in <module> main() File "C:\Users\gcpins\Documents\Speech2Speech\speechToSpeech.py", line 37, in main srProvider.recognize_loop() File "C:\Users\gcpins\Documents\Speech2Speech\speechRecognition\VoskProvider.py", line 93, in recognize_loop process_text(recognizedText, self.chosenLanguage) File "C:\Users\gcpins\Documents\Speech2Speech\speechToSpeech.py", line 49, in process_text helper.ttsProvider.synthesizeAndPlayAudio(translatedText, helper.chosenOutput) File "C:\Users\gcpins\Documents\Speech2Speech\ttsProviders\ElevenlabsProvider.py", line 56, in synthesizeAndPlayAudio self.ttsVoice.generate_and_stream_audio(prompt, outputDeviceIndex, AttributeError: 'ElevenLabsVoice' object has no attribute 'generate_and_stream_audio'

Windows 11
Using Python 3.10.0

Crashes after transcribing speech (presumably when sending transcribed text to 11Labs API).

Syntax error for Windows 10

It might just be my setup but the helper.py file errored out for my (Windows10) I had to change lines:

#27 defaultConfig: dict[str, str | int| list|float] = {
#167 def get_provider_config(provider: SpeechRecProvider | TTSProvider) -> dict[str, str|float|bool|int|list]:
#185 def update_provider_config(provider: SpeechRecProvider | TTSProvider, providerConfig:dict):

To
defaultConfig: Dict[str, Union[str, int, list, float]] = {
def get_provider_config(provider: Union[SpeechRecProvider, TTSProvider]) -> Dict[str, Union[str, float, bool, int, list]]:
def update_provider_config(provider: Union[SpeechRecProvider, TTSProvider], providerConfig: dict):

respectively and also
from typing import Dict, Union, Mapping

Something about in Python, the | is used for bitwise OR operations, not for defining the type of a value in a dictionary

Now it works like a charm! Great Job!

lugia19 / echo-xi Goto Github PK

echo-xi's Introduction

Echo-XI

Info

Installation and usage

Notes on vosk/recasepunc

echo-xi's People

Contributors

Stargazers

Watchers

Forkers

echo-xi's Issues

Crash when using ElevenLabs & Recasepunc/Vosk

Syntax error for Windows 10

[Feature request] Support tortoise-tts-fast for voice output

what is the best combination for low-latency?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs