GithubHelp home page GithubHelp logo

mharrvic / fast-audio-video-transcribe-with-whisper-and-modal Goto Github PK

View Code? Open in Web Editor NEW
75.0 3.0 5.0 365 KB

Fast Audio/Video transcribe using Openai's Whisper and Modal, an hour audio/video file can be transcribed in ~1 minute

Python 100.00%
fastapi modal openai python transcribe whisper

fast-audio-video-transcribe-with-whisper-and-modal's Introduction

Fast Audio/Video transcribe using Openai's Whisper and Modal

Powered by Modal.com for parallel processing on-demand, an hour audio file can be transcribed in ~1 minute.

"Modal’s dead-simple parallelism primitives are the key to doing the transcription so quickly. Even with a GPU, transcribing a full episode serially was taking around 10 minutes. But by pulling in ffmpeg with a simple .pip_install("ffmpeg-python") addition to our Modal Image, we could exploit the natural silences of the podcast medium to partition episodes into hundreds of short segments. Each segment is transcribed by Whisper in its own container task with 2 physical CPU cores, and when all are done we stitch the segments back together with only a minimal loss in transcription quality. This approach actually accords quite well with Whisper’s model architecture:"

“The Whisper architecture is a simple end-to-end approach, implemented as an encoder-decoder Transformer. Input audio is split into 30-second chunks, converted into a log-Mel spectrogram, and then passed into an encoder.” - Introducing Whisper

Transcription Flow

Demo

Audio Transcription

audio-transcribe.mp4

Video Transcription

video-transcribe.mp4

How to use

  1. Create a Modal account and get your API key.

    • Run this command to install modal client and generate token.

      pip install modal-client
      modal token new
      • The first command will install the Modal client library on your computer, along with its dependencies.

      • The second command creates an API token by authenticating through your web browser. It will open a new tab, but you can close it when you are done.

  2. Deploy your modal project with the following command.

    modal deploy api.main
  3. Transcribe your audio file using the following curl command. Replace the your-modal-endpoint, your-audio-src-url, title_slug, and is_video(for video transcribe) with your own.

    curl --location --request POST 'your-modal-endpoint/api/transcribe?src_url=your-audio-src-url&title_slug=your-amazing-title-slug&is_video=false'
    

    Sample response:

    {
      "call_id": "your-call-id"
    }
  4. Check the status of your transcription using the following curl command. Replace the your-call-id with your own (return from the previous command).

     curl --location 'your-modal-endpoint/api/status/your-call-id'
    

    Sample initial response:

    {
      "finished": false,
      "total_segments": 49,
      "tasks": 49,
      "done_segments": 0
    }

    Sample final response(poll this endpoint until finished is true):

    {
      "finished": true,
      "total_segments": 49,
      "tasks": 49,
      "done_segments": 49
    }
  5. Download the transcription using the following curl command. Replace the your-modal-endpoint and your-title-slug with your own (return from the previous command).

     curl --location 'your-modal-endpoint/api/audio/your-title-slug'
    

    Sample response:

    {
      "segments": [
        {
          "text": " Productivity also means that you're able to maximize the hours that you have and also rest deliberately in between.  That's real productivity because if you're just constantly working without breaks and without really knowing what your goals are and what you're achieving,",
          "start": 0.0,
          "end": 19.0
        },
        {
          "text": " that's not productivity, that's just busyness.  So that's the difference between productivity and busyness and it really starts from the very beginning of your day.",
          "start": 19.0,
          "end": 45.0
        }
      ]
    }

Resources:

fast-audio-video-transcribe-with-whisper-and-modal's People

Contributors

mharrvic avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

fast-audio-video-transcribe-with-whisper-and-modal's Issues

Can Modal send a status update once the transcription is done?

Thank you for this repo, especially because I am very new to Modal. Super easy and helpful for me to transcribe audio directly from links.

I send my URL and get a call_id. I have written a simple while True with time.sleep(120) to check if the call_id returns {"finished":True} and only then proceed with further data processing of the transcriptions received.

Is there a mechanism to get the status of the call_id from Modal? If yes, where and how do we write it?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.