GithubHelp home page GithubHelp logo

an-lee / echogarden Goto Github PK

View Code? Open in Web Editor NEW

This project forked from echogarden-project/echogarden

0.0 0.0 0.0 1.69 MB

Integrated speech toolset designed to be accessible to end-users. Fully open-source.

License: GNU General Public License v3.0

JavaScript 0.08% TypeScript 99.92%

echogarden's Introduction

Echogarden

Echogarden is an easy-to-use speech toolset that includes a variety of speech processing tools.

  • Easy to install, run, and update
  • Runs on Windows (x64), macOS (x64, ARM64) and Linux (x64, ARM64)
  • Written in TypeScript, for the Node.js runtime
  • Doesn't require Python, Docker, or other system-level dependencies
  • Doesn't rely on essential platform-specific binaries. Engines are either ported via WebAssembly, imported using the ONNX runtime, or written in pure JavaScript

Features

  • Text-to-speech using the VITS neural architecture, and 15 other offline and online engines, including cloud services by Google, Microsoft, Amazon, OpenAI and Elevenlabs
  • Speech-to-text using OpenAI Whisper, and several other engines, including cloud services by Google, Microsoft, Amazon and OpenAI
  • Speech-to-transcript alignment using several variants of dynamic time warping (DTW, DTW-RA), including support for multi-pass (hierarchical) processing, or via guided decoding using Whisper recognition models. Supports 100+ languages
  • Speech-to-text translation, translates speech in any of the 98 languages supported by Whisper, to English, with near word-level timing for the translated transcript
  • Speech-to-translated-transcript alignment attempts to synchronize spoken audio in one language, to a provided English-translated transcript, using the Whisper engine
  • Language detection identifies the language of a given audio or text. Provides Whisper or Silero engines for audio, and TinyLD or FastText for text
  • Voice activity detection attempts to identify segments of audio where voice is active or inactive. Includes WebRTC VAD, Silero VAD, RNNoise-based VAD and a custom Adaptive Gate
  • Speech denoising attenuates background noise from spoken audio. Includes the RNNoise engine
  • Source separation isolates voice from any music or background ambience. Supports the MDX-NET deep learning architecture
  • Word-level timestamps for all recognition, synthesis, alignment and translation outputs
  • Advanced subtitle generation, accounting for sentence and phrase boundaries
  • For the VITS and eSpeak-NG synthesis engines, includes enhancements to improve TTS pronunciation accuracy: adds text normalization (e.g. idiomatic date and currency pronunciation), heteronym disambiguation (based on a rule-based model) and user-customizable pronunciation lexicons
  • Internal package system that auto-downloads and installs voices, models and other resources, as needed

Installation

Ensure you have Node.js v18.16.0 or later installed.

then:

npm install echogarden -g

Additional required tools:

  • ffmpeg: used for codec conversions
  • sox: used for the CLI's audio playback

Both tools are auto-downloaded as internal packages on Windows and Linux.

On macOS, only ffmpeg is currently auto-downloaded. It is recommended to install sox via a system package manager like Homebrew (brew install sox) to ensure it is available on the system path.

Updating to latest version

npm update echogarden -g

Using the toolset

Tools are accessible via a command-line interface, which enables powerful customization and is especially useful for long-running bulk operations.

Development of more graphical and interactive tooling is planned. A text-to-speech browser extension is currently under development (but not released yet).

If you are a developer, you can also import the package as a module or interface with it via a local WebSocket service (currently experimental).

Documentation

Credits

This project consolidates, and builds upon the effort of many different individuals and companies, as well as contributing a number of original works.

Developed by Rotem Dan (IPA: /ˈʁɒːtem ˈdän/).

License

GNU General Public License v3

Licenses for components, models and other dependencies are detailed on this page.

echogarden's People

Contributors

rotemdan avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.