cxdy / srtlate Goto Github PK
View Code? Open in Web Editor NEWtranslate SRT files en javascript
License: MIT License
translate SRT files en javascript
License: MIT License
I forgot how languages work when building this.
Is there TL;DR for too long, didnt type? If not, this will be it's birth.
Anyways, TL;DT
As of the current version, each individual subtitle is translated out of context, so the translations don't necessarily make sense. I only (and barely) speak English, so I have no idea exactly how bad they are, but we need to figure out a way to determine if subtitles are related to each other, and then translate them so that they make some kind of sense.
IDEAS:
Build the SRT file through the program and run it through multiple languages at once
This is probably the most difficult approach, but also probably our best bet (and it's cool as fuck). Only thing is, I don't know how to do that yet. If we can figure it out, maybe there's a way to tell if a different person is speaking?
Determine if the times in the previous & next subtitle are close
00:00:11,636 --> 00:00:13,221
I'm assuming the format is HR:MIN:SEC,MS
? I'm just not 100% what the number after the comma is - seems like a safe assumption though. I know we talked about it before but I forgot.
Anyway, I guess for this approach, we could just create an empty list and add the subtitles that start within a second or two of the previous subtitle, run it and see if it makes sense? Only thing is, I don't know how we'd determine if it makes any sense with a computer. I can't speak other languages so I'm not confident that I can teach a computer how to speak other languages.
As of the current version (April 28 2019), we start with an English (or whatever language, I guess) SRT file. The file is converted to JSON so we can interact with the file programatically.
Line breaks from the source SRT file are converted to \n
in the JSON file cause JSON, and when the program goes to translate the captions, we pull the line break because obviously \n
won't translate. Gotta figure out a way to add them back though.
@kpmgeek mentioned that in America, the standard for each line in an SRT file is 32 characters, but other countries are more lenient with their standards. I think we can just count up to 32 characters, and if there's a character that is not a space (i.e: a letter, number, etc), we move back to the nearest space and insert a line break there.
However, this isn't necessarily the best approach because sometimes on the first line you have things like [Narrator]
and then the second line is what the narrator is saying, so we need to find a way around that & also find a way around leaving one word/character/whatever on the 2nd line with everything else up top. Maybe count the total characters for the caption (between 1-2 lines), then count all the spaces in that caption, subtract the number of spaces from the number of total characters and divide that number by 2?
Example Caption:
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam et maximus dolor.
Total Length - 80 characters
Number of spaces - 9 spaces
(80 - 9) / 2 = 35.5
I don't know where I'm going with that, but you get what I mean.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.