GithubHelp home page GithubHelp logo

mdauthentic / sous-title Goto Github PK

View Code? Open in Web Editor NEW
3.0 3.0 2.0 64 KB

Scala subtitle (.srt) .reader/writer

Home Page: https://mdauthentic.github.io/sous-title/

License: Other

Scala 100.00%
scala srt srt-subtitles reader srt-parser

sous-title's Introduction

Sous-title

Build Status contributions welcome

Read and convert subtitle (.srt) file to csv or List

build.sbt

libraryDependencies += "io.github.mdauthentic" % "sous-title_2.13" % "0.3.0"

Getting started

Import

import io.github.mdauthentic.core._

Reading example

Calling the open or readInLine method returns an SRT type containing id, startTime, endTime and sub (the subtitle itself).

scala> val reader = SRTReader.open("file.srt")
reader: List(SRT(1, 00:00:33.599, 00:00:35.270, List(Soy Amelia Folch.)))

Inline reader example

Inline reader returns a list of .srt type

scala> val srt =
      """1
        |00:00:33,599 --> 00:00:35,270
        |(NARRA) Soy Amelia Folch.
        |
        |2
        |00:00:36,199 --> 00:00:39,870
        |Tengo 23 años y sin embargo
        |he salvado la vida del Empecinado.""".stripMargin
scala> val inlineReader = SRTReader.readInLine(srt)
inlineReader: List(SRT(1,00:00:33.599,00:00:35.270,List((NARRA) Soy Amelia Folch.)), SRT(2,00:00:36.199,00:00:39.870,List(Tengo 23 años y sin embargo, he salvado la vida del Empecinado.)))

Extracting item

If you are interested in only some part of the result returned by the reader, for instance the subtitle and not the rest i.e. id, start and end time, then you can extract just the subtitle by doing something like this;

scala> inlineReader.sub
List(List((NARRA) Soy Amelia Folch.), List(Tengo 23 años y sin embargo, he salvado la vida del Empecinado.))

Writing example

There are two ways to write to file;

  • writing without header
scala> val reader = SRTReader.open("file.srt")
reader: List[SRT] = List(SRT(1, 00:00:33.599, 00:00:35.270, List(Soy Amelia Folch.)))
scala> SRTWriter.write(reader, "output.csv")

using file path directly

scala> SRTWriter.write("inputFileName.srt", "outputFileName.csv")
  • with user-defined header
scala> val header = List("id", "start_time", "end_time", "subtitle")
header: List[String] = List(id, start_time, end_time, subtitle)
scala> SRTWriter.write("input.srt", "output.csv", header)

Motivation

In Scandal (a TV series), wine was mentioned several times and I was curious to know the number of times the word was used in the entire series (from seasons 1 - 7). This library was used to convert all the subtitle files for this series to csv format for further analysis.

This library will come in handy in data analysis projects for parsing and extracting the contents of subtitle files.

License

Apache 2.0

sous-title's People

Contributors

mdauthentic avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.