GithubHelp home page GithubHelp logo

srtlate's People

Contributors

cxdy avatar

Stargazers

 avatar

Watchers

 avatar  avatar

srtlate's Issues

Translations are not contextual

I forgot how languages work when building this.

Is there TL;DR for too long, didnt type? If not, this will be it's birth.

Anyways, TL;DT
As of the current version, each individual subtitle is translated out of context, so the translations don't necessarily make sense. I only (and barely) speak English, so I have no idea exactly how bad they are, but we need to figure out a way to determine if subtitles are related to each other, and then translate them so that they make some kind of sense.

IDEAS:

  • Build the SRT file through the program and run it through multiple languages at once
    This is probably the most difficult approach, but also probably our best bet (and it's cool as fuck). Only thing is, I don't know how to do that yet. If we can figure it out, maybe there's a way to tell if a different person is speaking?

  • Determine if the times in the previous & next subtitle are close
    00:00:11,636 --> 00:00:13,221
    I'm assuming the format is HR:MIN:SEC,MS? I'm just not 100% what the number after the comma is - seems like a safe assumption though. I know we talked about it before but I forgot.
    Anyway, I guess for this approach, we could just create an empty list and add the subtitles that start within a second or two of the previous subtitle, run it and see if it makes sense? Only thing is, I don't know how we'd determine if it makes any sense with a computer. I can't speak other languages so I'm not confident that I can teach a computer how to speak other languages.

Line breaks will be the death of me

As of the current version (April 28 2019), we start with an English (or whatever language, I guess) SRT file. The file is converted to JSON so we can interact with the file programatically.

Line breaks from the source SRT file are converted to \n in the JSON file cause JSON, and when the program goes to translate the captions, we pull the line break because obviously \n won't translate. Gotta figure out a way to add them back though.

@kpmgeek mentioned that in America, the standard for each line in an SRT file is 32 characters, but other countries are more lenient with their standards. I think we can just count up to 32 characters, and if there's a character that is not a space (i.e: a letter, number, etc), we move back to the nearest space and insert a line break there.

However, this isn't necessarily the best approach because sometimes on the first line you have things like [Narrator] and then the second line is what the narrator is saying, so we need to find a way around that & also find a way around leaving one word/character/whatever on the 2nd line with everything else up top. Maybe count the total characters for the caption (between 1-2 lines), then count all the spaces in that caption, subtract the number of spaces from the number of total characters and divide that number by 2?

Example Caption:
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam et maximus dolor.
Total Length - 80 characters
Number of spaces - 9 spaces
(80 - 9) / 2 = 35.5

I don't know where I'm going with that, but you get what I mean.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.