GithubHelp home page GithubHelp logo

burakozyurt / stable-ts Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jianfch/stable-ts

0.0 0.0 0.0 143 KB

Stabilizing timestamps of OpenAI's Whisper outputs down to word-level

License: MIT License

Python 100.00%

stable-ts's Introduction

Stabilizing Timestamps for Whisper

Description

This script modifies and adds more robust decoding logic on top of OpenAI's Whisper to produce more accurate segment-level timestamps and obtain to word-level timestamps without extra inference.

image

TODO

  • Add function to stabilize with multiple inferences
  • Add word timestamping (previously only token based)

Setup

pip install stable-ts

To install the lastest version:

pip install git+https://github.com/jianfch/stable-ts.git

Command-line usage

Transcribe audio then save result as JSON file.

stable-ts audio.mp3 -o audio.json

Processing JSON file of the results into ASS.

stable-ts audio.json -o audio.ass

Transcribe multiple audio files then process the results directly into SRT files.

stable-ts audio1.mp3 audio2.mp3 audio3.mp3 -o audio1.srt audio2.srt audio3.srt

Show all available arguments and help.

stable-ts -h

Python usage

import stable_whisper

model = stable_whisper.load_model('base')
# modified model should run just like the regular model but accepts additional parameters
results = model.transcribe('audio.mp3')
jfk_segment.mp4
# the above uses default settings on version 1.1 with large model
# sentence/phrase-level
stable_whisper.results_to_sentence_srt(results, 'audio.srt')
jfk_word_segments.mp4
# the above uses default settings on version 1.1 with large model
# sentence/phrase-level & word-level
stable_whisper.results_to_sentence_word_ass(results, 'audio.ass')

Additional Info

  • Although timestamps are chronological, they can still very inaccurate depending on the model, audio, and parameters.
  • To produce production ready word-level results, the model needs to be fine-tuned with high quality dataset of audio with word-level timestamp.

License

This project is licensed under the MIT License - see the LICENSE file for details

Acknowledgments

Includes slight modification of the original work: Whisper

stable-ts's People

Contributors

jianfch avatar navalnica avatar emiliobarradas avatar emiliskiskis avatar eschmidbauer avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.