GithubHelp home page GithubHelp logo

yowidin / anime_translation Goto Github PK

View Code? Open in Web Editor NEW

This project forked from tekakutli/anime_translation

0.0 0.0 0.0 147 KB

AI-helped transcription and translation

License: MIT License

Shell 76.73% Python 6.29% Emacs Lisp 12.61% Makefile 4.37%

anime_translation's Introduction

Anime Translation Initiative

AI-helped transcription and translation
Everything works offline

LOAD THE FUNCTIONS

source snippets/enviromentvariables.sh #YOU MUST EDIT THIS ONE
source snippets/functions.sh
source snippets/opus.sh
source snippets/timeformat.sh

Workflow

  • I'm assuming you are using linux, and check Dependencies
  • More information about each component best researched in their own websites
  • The main workflow is as follows:
    • Setup the Model, Whisper here, and use it to translate directly from japanese audio to english text.
    • The timestamps often aren't completely aligned with the sound, so we can use an AutoSync: ffsubsync.
    • Next comes the human to fix the translation, split long captions, further align them, etc.
      • I propose the usage of Subed, which is an Emacs package
        • Subed allows us to:
          • Watch where are captioning in MPV
          • Efficiently move, merge, split the timestamps, with precision.
        • Instructions Here
    • Then, to fix grammar or spelling mistakes, we can use the Language-Tool
    • Finally, we can load the .vtt file with mpv and enjoy, Instructions Here
  • Some extra tools at your disposal:
    • The Opus Model is a text-to-text translator model, like Google-Translate
    • There are two extra tools to align the captions: a Visual Scene Detector(Scene-timestamps), and a Human Voice Detector(Speech Timestamps)
    • You can use Whisper to translate a snapshot of what you are hearing from your speakers, using the Speakers-Stream thing

Setup

Model Setup

used model: WHISPER
will download model ggml-large.bin from: here

make setup

Model Usage

get audio from video file for whisper to use

useWhisper

the -tr flag activates translation into english, without it transcribes into japanese

Warning
  • whisper often breaks with music segments
  • if you see it start outputing the same thing over and over, interrupt it
    • then use the -ot milliseconds flag to resume at that point
  • After interrupting, copy from Terminal, then format appropriately with:
formatToVtt

MPV

get mpv to load some subs

mpvLoadSubs

what subs?
git clone https://github.com/tekakutli/anime_translation/

VTT efficient creation or edit

I use subed
git clone https://github.com/sachac/subed
add Subed from configAdd.el to Emacs config.el
alternatively, add this extra:
git clone https://gist.github.com/mooseyboots/d9a183795e5704d3f517878703407184
add Subed Extra Section from configAdd.el to Emacs config.el

AutoSync the Subs

This ffsubsync-script first autosyncs japanese captions with japanese audio, and then uses those timestamps to sync english captions to japanese captions.
The japanese captions only need to be phonetically close, which means that we could use a smaller-faster model to get them instead, ggml-small.bin, here.
This is the reason behind the names, why some are called whisper_small vs whisper_large (the model used).

make installffsubsync
autosync

Other Utils

To .srt Conversion

vttToSrt subs.vtt

Export final .mp4 with subtitles

exportSubs

To format a given time in milliseconds or as timestamps, example:

#timeformat.sh has these two commodity functions:
milliformat "2.3" #2 minutes 3 seconds
stampformat "3.2.1" #3 hours 2 minutes 1 second

Grammar-Spelling Checking Language-Tool

Install full-version of Language Tool

make installlanguagetool

Activate it

languagetool

add LanguageTool section from configAdd.el to Emacs config.el
Emacs use:

(langtool-check)

Local Text Translation

your FROM-TO model is either here or here
example, to get the models I use:

make opusInstallExample

edit PATH_TO_SUBS/Opus-MT/services.json appropriately, then:

make installopus

To activate:

#opus.sh has commodity functions
Opus-MT

To use:

t "text to translate"

Get Event Timestamps

Scene-timestamps

Visual Scene timestamps:

make installSceneTimestamps

sceneTimestamps

VAD, Speech timestamps

What is VAD? VAD means: Voice Activity Detector
It gives you the speech timestamps, when human voice is detected
first install torch, then:

speechTimestamps

Translate the Speakers-Stream

you'll need to Ctrl-C to stop recording, after which it will translate the temporal recording

streamtranslate

if you were to have sway, you could put this in your sway config, and have an easy keybinding to translate what you are hearing

bindsym $mod+Shift+return exec alacritty -e bash /home/$USER/files/code/anime_translation/snippets/streamtranslate.sh

Dependencies

  • Linux, Bash, Mpv, Ffmpeg, Emacs, Subed
  • Whatever model you wish to use
  • Python if you use the "Get Event Timestamps" things
    • The vad.py thing downloads silero-vad by itself
  • Docker for the LibreGrammar(Language Tool) or the Opus things
  • If you want to translate your Speakers, you need pipewire
    • As commodity, you will need: wl-copy and wl-paste, if running on wayland
      • If you don't want them, remove them from streamtranslate.sh

Why X

  • Why Git over Google-Docs or similar?
    • Version Control Systems (git) is an ergonomic tool to pick or disregard from contributions, it enables trully parallel work distribution
  • Why .vtt over others?
    • whisper can output vtt or srt
    • subed can work with vtt or srt
    • why vtt over srt? personal choice, but:
      • vtt has no need for numbers as id
      • seems shorter and more efficient

anime_translation's People

Contributors

tekakutli avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.