GithubHelp home page GithubHelp logo

platisd / phonix Goto Github PK

View Code? Open in Web Editor NEW
26.0 2.0 2.0 32 KB

Generate captions for videos using the power of OpenAI's Whisper API

License: MIT License

Python 100.00%
openai openai-api openai-whisper video-srt video-to-caption video-to-text whisper

phonix's Introduction

Phonix

Generate captions for videos using the power of OpenAI's Whisper API

What?

Phonix is a Python program that uses OpenAI's API to generate captions for videos.

It uses the Whisper model, an automatic speech recognition system that can turn audio into text and potentially translate it too. Compared to other solutions, it has the advantage that its transcription can be "enhanced" by the user providing prompts that indicate the "domain" of the video. This means you may get better results if you use technical terms, acronyms and jargon.

Captivating captions

Now phonix supports "captivating" captions, which means that you can produce captions that highlight the currently spoken words in the video and choose the maximum number of words present in each caption. This means you will be able to produce "influencer-style" captions with few words per caption and highlighting the current word. ๐Ÿ’ซ
This is enabled through stable-ts so you will need to install it (see below).

Overall the following options are available when it comes to styling the captions:

  • Highlight the current word
  • Choose the maximum number of words per caption
  • Choose the caption font size
  • Choose the caption font color
  • Choose the caption font family

Why?

Captions are not just for the hearing impaired. They make your content more engaging by boosting your audience's focus, attention and comprehension while allowing them to watch your video without sound.

I was not particularly satisfied with the accuracy of Youtube's and Linkedin's automatic captions so I gave Whisper a try and was impressed by the results. Phonix makes it easy to use Whisper and generate captions for your videos.

How?

Phonix first extracts the audio from the video, then downsamples it in case it's over 25 MB and finally sends it to OpenAI's Whisper API. The API returns the captions in the specified format and Phonix saves them to a file. You can then use the captions in your video editor of choice.

Phonix was originally a command line application but I thought it'd be cool to create a simple GUI for it. Use whichever you feel more comfortable with.

Installation

  • Get an OpenAI API key
    • This is a paid service and a 25 minute South Park episode cost me around $0.30 to transcribe
  • Clone or download this repository
  • Install a recent version of Python with Tkinter
  • Install ffmpeg for your platform
  • Install Python dependencies: pip install -r requirements-basic.txt
    • If you want to transcribe locally without the need to pay for an OpenAI API key, then pip install -r requirements-advanced.txt and choose to run Whisper locally.

Command line usage

phonix.py is the command line interface that also includes the main logic of the program.
It has a few options that you can see by running python phonix.py --help.

GUI usage

Assuming you have installed the dependencies, you can run the GUI with python phonix_gui.py. A demo of the tool can be found in this video.

phonix's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

phonix's Issues

Support .ass format

Support the .ass format which properly supports subtitles formatting unlike .srt where the support is a bit hit or miss.

The easiest way is to use .srt and then turn it to .ass for applying the formatting.
Consider whether we should only export in .ass if someone wants to change the font family or the font size.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.