jsonzilla / vtt_to_srt3 Goto Github PK

View Code? Open in Web Editor NEW

57.0 3.0 14.0 100 KB

python script to convert all SIMPLE vtt files in a directory and all of its subdirectories to srt subtitle format

Home Page: https://pypi.org/project/vtt-to-srt3/

License: Apache License 2.0

Python 100.00%

vtt vtt-files srt srt-subtitle-format

vtt_to_srt3's Introduction

vtt_to_srt3

Convert vtt files to srt subtitle format

For Python 3.x you can get version for Python 2.7 here

Docs

https://jsonzilla.github.io/vtt_to_srt3/

Installation

pip install vtt_to_srt3

python -m pip install vtt_to_srt3

Usage from terminal

usage: vtt_to_srt [-h] [-r] [-e ENCODING] pathname

Convert vtt files to srt files

positional arguments:
  pathname              a file or directory with files to be converted

options:
  -h, --help            show this help message and exit
  -r, --recursive       walk path recursively
  -e ENCODING, --encoding ENCODING
                        encoding format for input and output files

Usage as a lib

Convert vtt file

from vtt_to_srt.vtt_to_srt import ConvertFile

convert_file = ConvertFile("input_utf8.vtt", "utf-8")
convert_file.convert()

Recursively convert all vtt files in directory

from vtt_to_srt.vtt_to_srt import ConvertDirectories

recursive = False
convert_file = ConvertDirectories(".", recursive, "utf-8")
convert_file.convert()

Manual build

Generate wheel

python -m pip install --upgrade setuptools wheel build
python -m build

Generate documentation

python -m pip install pdoc3
pdoc --html vtt_to_srt/vtt_to_srt.py -o docs
mv docs/vtt_to_srt.html docs/index.html
rm -rm docs/vtt_to_srt

vtt_to_srt3's People

Contributors

Stargazers

Watchers

Forkers

adarsh1999 chartist83 neel-bp pendave trekologer joeljiezhu trelicon uahim mycart franchr yelban nkta3m shniranjan deminhlo

vtt_to_srt3's Issues

[BUG] Converting timestamp

Note: for support questions, please use stackoverflow.
This repository's issues are reserved for feature requests and bug reports.
Your issue may already be reported! Please search on the issue tab before creating one.

Expected Behavior

Converting correctly

Current Behavior

Can't convert 1 case

Possible Solution

Context (Environment)

Version: vtt_to_srt3-0.2.0.1-py3-none-any.whl
Platform: Windows 64-bit, Python 3.9-64bit
Subsystem:
Files: vtt_to_srt.py

Detailed Description

Converting this subtitle has an issue with 1 specific timestamp

VTT

59:53.280 --> 59:57.480 line:90% position:50% align:middle
‫text


59:57.720 --> 01:00:00.240 line:90% position:50% align:middle
‫text


01:00:00.360 --> 01:00:02.040 line:90% position:50% align:middle
text

01:00:02.160 --> 01:00:06.080 line:90% position:50% align:middle
text

01:00:06.880 --> 01:00:09.080 line:90% position:50% align:middle
text

01:00:09.200 --> 01:00:10.920 line:90% position:50% align:middle
text

SRT

664
00:59:53,280 --> 00:59:57,480
text

59:57,720 --> 01:00:00,240
text

665
01:00:00,360 --> 01:00:02,040
text

666
01:00:02,160 --> 01:00:06,080
text

667
01:00:06,880 --> 01:00:09,080
text

668
01:00:09,200 --> 01:00:10,920
text

Should be

664
00:59:53,280 --> 00:59:57,480
text

665
00:59:57,720 --> 01:00:00,240
text

666
01:00:00,360 --> 01:00:02,040
text

667
01:00:02,160 --> 01:00:06,080
text

668
01:00:06,880 --> 01:00:09,080
text

669
01:00:09,200 --> 01:00:10,920
text

As you notice in the minute 59:57.720 --> 01:00:00.240 it converts it to 59:57,720 --> 01:00:00,240
without the 00: and a sequence number

Possible Implementation

Cannot work with directory of vtt files

Traceback (most recent call last):
File "path\Documents\audicut\vtts_to_srtttt.py", line 1, in
from vtt_to_srt import vtts_to_srt
ImportError: cannot import name 'vtts_to_srt' from 'vtt_to_srt' (C:\path\Programs\Python\Python37\lib\site-packages\vtt_to_srt_init_.py)
[Finished in 0.4s]

'gbk' codec can't decode byte 0x82 in position 107: illegal multibyte sequence

Traceback (most recent call last):
  File "\music_core\ytb_dlr.py", line 203, in file_path
    v2s.vtt_to_srt(file_path + "/" + filename)
  File "\vtt_to_srt\vtt_to_srt.py", line 68, in vtt_to_srt
    file_contents: str = read_text_file(str_name_file)
  File "\vtt_to_srt\vtt_to_srt.py", line 59, in read_text_file
    return f.read()
UnicodeDecodeError: 'gbk' codec can't decode byte 0x82 in position 107: illegal multibyte sequence

The English version is converted normally, but zh-TW does not seem to be converted correctly

n-BXNXvTvV4.zh-TW.zip

No module named 'vtt_to_srt'

I have ran

pip install vtt_to_srt3

and it said Successfully installed vtt-to-srt3-0.1.8.4.

But can't import the module. Why?

[BUG] Conversion does not remove optional cue identifiers in a vtt cue block (results in doubled sequence numbering)

Expected Behavior

In the WebVTT format each cue block can include an optional cue identifier.

The vtt-files I currently want to convert with the help of this library are very close to the srt format and so use a sequence number as cue identifier already.

So looking at your code that converts the formats, I would expect this cue identifier to get handled (removed) to create a valid conversion.

Current Behavior

When I try to convert those numbered cues, I end up with a doubled numbering in consecutive lines before each timestamp.
This is, because you do not look for cue identifiers and remove them in the current version of this library.
Sequence numbers are added in any case and so you end up with a doubled numbering.

Possible Solution

As the conversion in this library is mostly done by several direct replacements on the file contents instead of parsing the full vtt content first, it is not easy to modify it and drop any possibly detected cue identifier lines.
So I modified the function add_sequence_numbers to drop any non-empty lines before a line with a timestamp.
It's not a very elegant solution, but it works and doesn't need a complex redesign of the input handling in convert_content.

Steps to Reproduce

Download example file 'E1x1_en.vtt.txt' from attachments and rename it to .vtt
Convert the single file with the following code snippet

import vtt_to_srt.vtt_to_srt as vtt_to_srt
vtt_file = vtt_to_srt.ConvertFile('E1x1_en.vtt', 'utf-8')
vtt_file.convert()

Check created srt-file for double numbering in front of any cue block

Context (Environment)

Version: vtt_to_srt3-0.2.0.0-py3-none-any.whl
Platform: Windows 64-bit, Python 3.7-32bit
Subsystem: -
Files: vtt_to_srt.py

Detailed Description

see above

Possible Implementation

    def add_sequence_numbers(self, contents):
        """Adds sequence numbers to subtitle contents and returns new subtitle contents

        :contents -- contents of vtt file
        """
        output = ''
        lines = contents.split('\n')
        i = 1
        n = 0
        while n < len(lines)-1:
            line = lines[n]
            next_line = lines[n+1]
            if self.has_timestamp(next_line):
                if line == '':
                    output += '\n'
                output += str(i) + '\n'
                output += next_line + '\n'
                i += 1
                n += 2
            else:
                output += line + '\n'
                n += 1
        output += lines[-1] + '\n'
        return output

E1x1_en.vtt.txt

jsonzilla / vtt_to_srt3 Goto Github PK

vtt_to_srt3's Introduction

vtt_to_srt3

Docs

Installation

Usage from terminal

Usage as a lib

Manual build

Generate documentation

vtt_to_srt3's People

Contributors

Stargazers

Watchers

Forkers

vtt_to_srt3's Issues

Expected Behavior

Current Behavior

Possible Solution

Context (Environment)

Detailed Description

Possible Implementation

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce

Context (Environment)

Detailed Description

Possible Implementation

Recommend Projects

Recommend Topics

Recommend Org

Jobs