GithubHelp home page GithubHelp logo

pandruszkow / whisper-inference-server Goto Github PK

View Code? Open in Web Editor NEW
8.0 1.0 0.0 7 KB

A networked inference server for Whisper so you don't have to keep waiting for the audio model to reload for the x-hunderdth time.

License: Creative Commons Zero v1.0 Universal

Python 52.06% HTML 47.94%
flask inference-api inference-server python3 whisper-ai

whisper-inference-server's Introduction

whisper-inference-server

An inference server for Whisper so you don't have to keep waiting for the audio model to reload for the x-hunderdth time.

Installation

Clone the repository, and then run the following to install and run this in a venv.

virtualenv .
source bin/activate
pip install -r requirements.txt 
python3 whisper_inference_server.py 

You should be fine provided that Whisper itself works fine on your machine. Make sure that CUDA is detected if you want to use that.

How to use (endpoints)

By default, this will listen on http://127.0.0.1:5000. The following endpoints are available on that port:

  • /: shows a plain HTML+JS page that can be used as a demo. Use the file picker to provide an audio file, click submit, and then wait until you see the transcription.

    • You will need to wait for the file to process fully before seeing anything, so be patient if it's taking a while.
  • /recognise: this is what you want if you're going to be programming against this server. This endpoint accepts an audio file, transcribes it into a JSON response, and returns that. You get exactly the kind of object you would get from Whisper's Python API.

    • The transcription result JSON is pretty large and detailed, so you probably want just the text field unless you're doing something advanced. I'm including all that data to make the transition from Whisper Python API to the Whisper inference server API as invisible as possible. In the future, this level of detail may be optional.
    • This endpoint is synchronous for simplicity - it blocks for as long as it takes to produce the transcription, so if you want to be doing other things while this is churning away in the background, then I recommend calling this from a thread.
  • /reload: change which speech recognition model is used by the subsequent transcription requests.

    • If you do not know what this does, then you don't need to use this.
    • Request format: JSON with a model_name field specifying the name of the model to load. Same names as the command line Whisper utility

Security

This is designed to be deployed on a trusted internal network. Security is non-existent. Add extra layers of security to protect this from outside access.

Available model sizes

By default, the server starts with the medium Whisper model loaded, but it can use every size that the original CLI utility can handle. Quantised models may be added at a later date.

TODO

  • Quantised models to allow loading large on <8 GB VRAM
  • Security & make this more "productionised" in how it runs - an auth token in the request header should prevent abuse, and some sort of CGI/containerisation support may help
  • Docker support + GPU pass-through
  • Make detailed transcription JSON an option rather than the default
  • Streaming WAV transcription endpoint for real-time speech recognition
  • Async endpoint with the concept of "transcription jobs" - might be more suitable for transcribing larger files. See if progress can be included here.

Credits

Shout out to https://github.com/davabase/whisper_real_time whose code inspired me to do this.

whisper-inference-server's People

Contributors

pandruszkow avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.