tkarabela / pysubs2 Goto Github PK

View Code? Open in Web Editor NEW

274.0 9.0 38.0 2.71 MB

A Python library for editing subtitle files

Home Page: http://pysubs2.readthedocs.io

License: MIT License

Python 99.68% Makefile 0.17% Shell 0.14%

python subtitles-parsing subtitles srt substation-alpha microdvd mpl2 webvtt closed-captions

pysubs2's Introduction

pysubs2

pysubs2 is a Python library for editing subtitle files. It’s based on SubStation Alpha, the native format of Aegisub; it also supports SubRip (SRT), MicroDVD, MPL2, TMP and WebVTT formats and OpenAI Whisper captions.

There is a small CLI tool for batch conversion and retiming.

pip install pysubs2
pysubs2 --shift 0.3s *.srt
pysubs2 --to srt *.ass

import pysubs2
subs = pysubs2.load("my_subtitles.ass", encoding="utf-8")
subs.shift(s=2.5)
for line in subs:
    line.text = "{\\be1}" + line.text
subs.save("my_subtitles_edited.ass")

To learn more, please see the documentation. If you'd like to contribute, see CONTRIBUTING.md.

pysubs2 is licensed under the MIT license (see LICENSE.txt).

pysubs2's People

Contributors

Stargazers

Watchers

pysubs2's Issues

Add typehints

Write typehints as per PEP 484 for better programming experience.

&HBBGGRR format color in SSA files

I've seen some SSA files using &HBBGGRR format for PrimaryColour instead of an integer, which is just like &HAABBGGRR in ASS.
Since SSA is outdated, I couldn't tell if the format is really supported in original SSA.
Both Aegisub and VLC can recognize it as far as I know, while pysubs2 will just produce an error.

Example File:
https://secure.assrt.net/download/257308/-/1/[YYDM-11FANS][Mobile%20Suits%20Gundam%200079][43][BDRIP][960x720][X264-10bit_AAC][AD847C98].boboqiu-tc.ssa

The file was generated by SrtEdit as its header says.

Style:

[V4 Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, TertiaryColour, BackColour, Bold, Italic, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, AlphaLevel, Encoding
Style: Default,SimHei,30,&HFFFFFF,&H00FFFF,&H000000,&H000000,-1,0,1,2,3,2,20,20,20,0,1

Error:

Traceback (most recent call last):
  File "subtest.py", line 7, in <module>
    subs = pysubs2.SSAFile.load("example.ssa", encoding="utf-16")
  File "/Users/admin/.pyenv/versions/3.8.3/lib/python3.8/site-packages/pysubs2/ssafile.py", line 100, in load
    return cls.from_file(fp, format_, fps=fps, **kwargs)
  File "/Users/admin/.pyenv/versions/3.8.3/lib/python3.8/site-packages/pysubs2/ssafile.py", line 160, in from_file
    impl.from_file(subs, fp, format_, fps=fps, **kwargs)
  File "/Users/admin/.pyenv/versions/3.8.3/lib/python3.8/site-packages/pysubs2/substation.py", line 203, in from_file
    field_dict = {f: string_to_field(f, v) for f, v in zip(STYLE_FIELDS[format_], raw_fields)}
  File "/Users/admin/.pyenv/versions/3.8.3/lib/python3.8/site-packages/pysubs2/substation.py", line 203, in <dictcomp>
    field_dict = {f: string_to_field(f, v) for f, v in zip(STYLE_FIELDS[format_], raw_fields)}
  File "/Users/admin/.pyenv/versions/3.8.3/lib/python3.8/site-packages/pysubs2/substation.py", line 158, in string_to_field
    return ssa_rgb_to_color(v)
  File "/Users/admin/.pyenv/versions/3.8.3/lib/python3.8/site-packages/pysubs2/substation.py", line 72, in ssa_rgb_to_color
    x = int(s)
ValueError: invalid literal for int() with base 10: '&HFFFFFF'

[QA] DeprecationWarning in tests: 'alignment' attribute should be an Alignment instance

Pytest prints a warning:

tests/test_substation.py::test_alignment_given_as_integer
  /var/tmp/portage/dev-python/pysubs2-1.6.0/work/pysubs2-1.6.0/pysubs2/substation.py:333: DeprecationWarning: The 'alignment' attribute of SSAStyle should be an Alignment instance, using plain int is deprecated
    warnings.warn("The 'alignment' attribute of SSAStyle should be an Alignment instance, using plain int is deprecated", DeprecationWarning)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

Python version: 3.10.8
Pytest version: 7.1.3

SRT to ASS conversion cap at 10h timecode

Hi,

Thank you for the great tool, i am trying to convert an SRT file that was generated via whisper which is about 24HR stream, the conversion seems to be fine, just the timing caps at 9:59:59.99,9:59:59.99, and everything after that has the same timecode. Is there anyway to fix this? i attached the srt file, just change the .log to .srt

test.log

Can't add new styles in python2

Hi,
When adding a new style to the subtitles I get the following error:

Traceback (most recent call last):
  File "colorset.py", line 20, in <module>
    subs1.save(foutput,format_='ass')
  File "/home/mohammad/.local/lib/python2.7/site-packages/pysubs2/ssafile.py", line 190, in save
    self.to_file(fp, format_, fps=fps, **kwargs)
  File "/home/mohammad/.local/lib/python2.7/site-packages/pysubs2/ssafile.py", line 222, in to_file
    impl.to_file(self, fp, format_, fps=fps, **kwargs)
  File "/home/mohammad/.local/lib/python2.7/site-packages/pysubs2/substation.py", line 253, in to_file
    fields = [field_to_string(f, getattr(ev, f)) for f in EVENT_FIELDS[format_]]
  File "/home/mohammad/.local/lib/python2.7/site-packages/pysubs2/substation.py", line 242, in field_to_string
    raise TypeError("Unexpected type when writing a SubStation field")
TypeError: Unexpected type when writing a SubStation field

If I run the same code in python3 it runs fine but if i run it under python2 i get the error,
I'm trying to use pysubs2 in kodi which currently only support python2.
The code I used

import pysubs2
import chardet

finput='en.srt'
foutput='en.ass'
with open(finput,'rb') as fi:
	rawdata = fi.read()
	encoding = chardet.detect(rawdata)['encoding']
fi.close()

subs1 = pysubs2.load(finput)
top_style = pysubs2.SSAStyle()
top_style.alignment=8
subs1.styles['top-style'] = top_style
for line in subs1:
	line.style='top-style'
subs1.save(foutput,format_='ass')

Drop Python 2, Support Python 3

Title says it all. I'm planning to provide PR for this, which will start from 3.4 exactly.

Mixing SSA and ASS give weird results

Hi, i hope this time get a real issue, and not something of my fault..

Well, if i we insert SSA data in ASS or something like that, the code is not transformed correctly, and in the end we will can't see the subs, or part of the subs, or with other colors, etc, here all the code:

ED.ssa

[Script Info]
Title: karaoke
ScriptType: v4.00+
WrapStyle: 0
ScaledBorderAndShadow: yes
PlayResY: 600

[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, TertiaryColour, BackColour, Bold, Italic, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, AlphaLevel, Encoding
Style: OK,Britannic Bold,30,16777215,65535,65535,&H0029464b,0,0,1,2,1,2,10,10,10,0,0

[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:00.00,0:00:05.00,OK,NTP,0000,0000,0000,!Effect,Hi

KED.ass

[Script Info]
; Script generated by Aegisub 3.2.2
; http://www.aegisub.org/
Title: karaoke
ScriptType: v4.00+
WrapStyle: 0
ScaledBorderAndShadow: yes
PlayResY: 600

[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
Style: KED,Arial,20,&H00FFFFFF,&H000088EF,&H00000000,&H00666666,-1,0,0,0,100,100,0,0,1,3,0,8,10,10,10,1

[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:00.00,0:00:05.00,KED,,0,0,0,fx,Booo

code:

import pysubs2

def merge(ss):
    s1 = ss[0]
    for i_ in range(1, len(ss)):
        i = ss[i_]
        for j in i:
            s1.insert(0, j)
        s1.import_styles(i)
    return s1

a=pysubs2.load("ED.ssa")
b=pysubs2.load("KED.ass")

a=merge([a, b])
a.save("t1.ssa")
a.save("t2.ass")

This is exactly what i run, and if you play any of the two files you will find the subs are not right.

In the t1 case the colors are wrong, in the second case only is displayed KED.ass file.

Bye.

ASS/SSA color swapping bug

Loading and saving an SSAFile swaps green and blue color channels. (Reported by Eric Williams)

pysubs2 detecting "," in text as new param

Hi, i found this bug..., basically this:

[Script Info]
Title: Bugs in the window
ScriptType: v4.00

[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:00.00,0:00:01.00,Default,,0000,0000,0000,,Hi, i'm new here

This will works in the reproducer but will fails in pysubs2, in the text field, the "," is interpreted as a new param and fails with:

AttributeError: 'NoneType' object has no attribute 'groups'

Bye.

html tags --clean

Would be possible that under the --clean command or a new one will it delete the html tags on the subtitles for example

<i>They're dying in vain!</i>

to 

They're dying in vain!

Thanks in advance

/usr/bin/pysubs2.py doesn't work

Hi, I'm using pysubs2 on Linux and I have a problem when trying to use the pysubs2.py script.
python -m pysubs2 works fine and the .py script works fine too, if I rename it.
https://gist.github.com/YamashitaRen/5846848c372369f28431

ssaevent hashable?

Hi
So in a program I'm doing I need to do something like this
import pysubs2

subs = pysubs2.load("subtitle.ass", encoding = "utf-8")
...
if subtitle in list:
print("In.")

However, This doesn't work
TypeError: unhashable type: 'SSAEvent'
So could it become hashable?

[Feature Request] Handle WebVtt cue settings

It would be nice if pysubs2 could interpret all the cue settings: https://developer.mozilla.org/en-US/docs/Web/API/WebVTT_API#cue_settings

vertical
line
position
size
align

This would allow to correctly convert some .ass tag into cue vtt setting and vice-versa.

tutorial.rst has a spelling issue in line 60

frame => frames

[QA] DeprecationWarning while running tests

Pytest prints a warning:

tests/test_substation.py:210: DeprecationWarning: invalid escape sequence '\k'

The mentioned line is:

pysubs2/tests/test_substation.py

Line 210 in 2378e76

ASS_WITH_SHORT_MINUTES_SECONDS_REF = """

That string contains the invalid sequence.

Python version: 3.10.5
Pytest version: 7.1.2

Optional time format in SSA/ASS

Hi, i found the player can reproduce files with 0:01:19.24 and 0:01:19:24 formats, i don't know if in the specs or if should be supported here.

Bye.

Cannot read script info with out space after colon.

For example,

[Script Info]
Title:Ttitle
Original Script:Script
Synch Point:0
ScriptType:v4.00+
Collisions:Normal
PlayResX:640
PlayResY:360
Timer:100.0000

pysubs2 only has a empty dict in info.

Error: Framerate must be specified when writing MicroDVD.

How can the framerate be set when this line results in the mentioned error?
subs.save(file_name, format_="microdvd", fps=24)

Support for Graphics section

Could you consider adding support for Graphics section, please? It is lost after conversion.
Aegisub is able to add files to this section. And, if I remember correctly, VSFilterMod can render images from it.

The format of the Graphics section is almost the same as that of Fonts. Just "fontname" becomes "filename".
Specs
sample.zip

Spaces in "Style"

I have .ass generated by third-party software. (see file.zip)

This file contains spaces between values in "Style" line

Style: Default, Arial, 20, &H00FFFFFF, &H00000000, &H00000000, &H00000000, 0, 0, 0, 0, 100, 100, 0, 0, 1, 2, 0, 2, 15, 15, 15, 0

this why pysubs2 can't parse the file

\lib\site-packages\pysubs2\substation.py in <dictcomp>(.0)
    235                 buf = rest.strip().split(",")
    236                 name, raw_fields = buf[0], buf[1:] # splat workaround for Python 2.7
--> 237                 field_dict = {f: string_to_field(f, v) for f, v in zip(STYLE_FIELDS[format_], raw_fields)}
    238                 sty = SSAStyle(**field_dict)
    239                 subs.styles[name] = sty

\lib\site-packages\pysubs2\substation.py in string_to_field(f, v)
    166                     return timestamp_to_ms(TIMESTAMP.match(v).groups())
    167             elif "color" in f:
--> 168                 return rgba_to_color(v)
    169             elif f in {"bold", "underline", "italic", "strikeout"}:
    170                 return v == "-1"

\lib\site-packages\pysubs2\substation.py in rgba_to_color(s)
     72         x = int(s[2:], base=16)
     73     else:
---> 74         x = int(s)
     75     r = x & 0xff
     76     g = (x >> 8) & 0xff

ValueError: invalid literal for int() with base 10: ' &H00FFFFFF'

Also line
[v4+ Styles] starts with small letter 'v' and I have to specify 'format_' explicitly.

Probably, it will be a good idea to remove spaces from "Style" line and do invariant comparison during guess of file format?

Problem in default values, overwrited SSA to ASS

Hi, there to much things in this issue, #18, i'll split point by point, the first is this, some values in the conversion are overwrited somewhere and ignore the default values of https://github.com/tkarabela/pysubs2/blob/master/pysubs2/ssastyle.py

Here an example:

[Script Info]
Title: karaoke
ScriptType: v4.00+
WrapStyle: 0
ScaledBorderAndShadow: yes
PlayResY: 600

[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, TertiaryColour, BackColour, Bold, Italic, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, AlphaLevel, Encoding
Style: OK,Britannic Bold,30,16777215,65535,65535,&H0029464b,0,0,1,2,1,2,10,10,10,0,0

[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:00.00,0:00:05.00,OK,NTP,0000,0000,0000,!Effect,Hi

Converting to ASS:

[Script Info]
; Script generated by pysubs2
; https://pypi.python.org/pypi/pysubs2
Title: karaoke
ScriptType: v4.00+
WrapStyle: 0
ScaledBorderAndShadow: yes
PlayResY: 600

[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
Style: OK,Britannic Bold,30.0,&H00777215,&H00000535,&H00000535,&H0029464B,0,0,0,0,1.0,2.0,10.0,10.0,10,0.0,0.0,2,10,10,10,1

[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:00.00,0:00:05.00,OK,NTP,0,0,0,!Effect,Hi

Here a comparative values doesn't match:

                   SSA (original), ASS (converted)
BorderStyle 1                     10
Outline        2                      0
Shadow       1                      0
Encoding     0                      1

And there is some weird values, like ScaleX and ScaleY, in the dafult values of pysubs2 there is right values, but when write the file there is others, here other table, that values are ASS only, so should use the pysubs2 values (don't exist in SSA):

              Default (pysubs2)  Writed value (converted)
ScaleX   100                        1
ScaleY   100                        2
Spacing 0                           10
Angle     0                           10

Bye.

Embedded Fonts

Hi, i think pysub2 doesn't copy embedded fonts to new subtitle file. Can you add this feature please?
smacke/ffsubsync#126

how to set ass style with a dict

Now , when i set ass stlye :
import pysubs2
assfile = pysubs2.SSAFile()
assfile.styles['sytle1'] = pysubs2.SSAStyle(fontname = 'xxx', fontsize = 20)

can i use a dict to work with it, just like
style_dict = {
'fontname':'xxx',
'fontsize':20
}

assfile.styles['sytle1'] = pysubs2.SSAStyle(style_dict)

Support timestamps without ms

Hey,

I had issues when reading .ass files where the timestamps do not have a millisecond attached. I suggest / propose to add this support as it can not be expected that all timestamps always have this information.

The fix is quite easy, but I can not push a branch to the repo, so I include it here:

file time.py:

TIMESTAMP = re.compile(r"(\d{1,2}):(\d{2}):(\d{2})[.,]?(\d{0,3})")

in function timestamp_to_ms:

    if groups[-1] == '':
        h, m, s = map(int, groups[:-1])
        ms = 0
    else:
        h, m, s, frac = map(int, groups)
        ms = frac * 10**(3 - len(groups[-1]))

The corresponding tests, test_time.py need a small update too:

    assert TIMESTAMP.match("12:45:67").groups() == ("12", "45", "67", '')
    assert TIMESTAMP.match("1:23:45,").groups() == ("1", "23", "45", '')
    assert TIMESTAMP.match("1:23:45.").groups() == ("1", "23", "45", '')

and remove them from the rejected timestamps section.

I hope the changes make sense to you!

TIMESTAMP fails to match negative timestamps

It would be nice to adjust the TIMESTAMP regex so that it coped with negative timestamps rather than:

  File "/usr/lib/python3.7/site-packages/pysubs2/ssafile.py", line 92, in load
    return cls.from_file(fp, format_, fps=fps, **kwargs)
  File "/usr/lib/python3.7/site-packages/pysubs2/ssafile.py", line 152, in from_file
    impl.from_file(subs, fp, format_, fps=fps, **kwargs)
  File "/usr/lib/python3.7/site-packages/pysubs2/substation.py", line 204, in from_file
    field_dict = {f: string_to_field(f, v) for f, v in zip(EVENT_FIELDS[format_], raw_fields)}
  File "/usr/lib/python3.7/site-packages/pysubs2/substation.py", line 204, in <dictcomp>
    field_dict = {f: string_to_field(f, v) for f, v in zip(EVENT_FIELDS[format_], raw_fields)}
  File "/usr/lib/python3.7/site-packages/pysubs2/substation.py", line 148, in string_to_field
    return timestamp_to_ms(TIMESTAMP.match(v).groups())
AttributeError: 'NoneType' object has no attribute 'groups'

Granted, negative timestamps are nonsensical but Postel's Law and all that.

The particular issue that I've encountered that results in this: tp7/Sushi#34

Is there a way to run Aegisub's resolution resampler in a python script?

Filtering and merging two .ass files of differing PlayRes's makes the subs from the lower resolution file minuscule and off center.

Subtitle File To Text is omitting subscripts

I am using pysub2 to read subtitle files. Here is a little example:


SIMPLE_FILE = """
1
00:00:00,000 --> 00:01:00,000
cette matrice-là <i>E<sub>t</sub>·…·E<sub>1</sub>A</i> possède une ligne

2
00:01:00,000 --> 00:02:00,000
there was a SubRip file
with two subtitles.
"""
with open("subtitles.srt", "w", encoding="utf-8") as fp:fp.write(SIMPLE_FILE)

import pysubs2
subs = pysubs2.load("subtitles.srt",format_= "srt")
subs[0].text

As a result I get:

cette matrice-là {\i1}Et·…·E1A{\i0} possède une ligne

As you can see the symbols where successfully recognized, however the subscript _{was omitted.}

I am wondering, if there is a method to also make sure that the subscript is there ?

bug, when parse srt

Hi,

I'm using pysubs2 to parse a existing srt file, which contains following snip. It treated 394 as text of 393. which should be two different subs with empty text.

And, it looks like subs are better separated by "A blank line " instead of timecode's line by https://www.matroska.org/technical/specs/subtitles/srt.html

Regards,
Jarod

`
392
00:29:27,46 --> 00:29:29,83
I'm Liza Minnelli..

393
00:00:00,00 --> 00:00:00,00

394
00:00:00,00 --> 00:00:00,00
`

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

I tried a few things, but they all failed,

Options to keep the ASS positioning tag in srt files

Hi @tkarabela I am using your library to do some preprocessing on some SRT files and I noticed that your code remove the ASS positioning tag (e.g. {\an7}).
Is it possibile to add a parameter to save function to keep them also in the output file?

Converting TMP into SubRip results in overlapped cues

Hi,

When trying to convert a TMP subtitle into to SubRip, I found the end time generated by pysubs2 for a subtitle cue is after the start time of the next one. E.g,

TMP | SubRip
00:00:12:I ... | 00:00:12,000 --> 00:00:15,113
00:00:14:observing ... | 00:00:14,000 --> 00:00:18,319
00:00:18:and ... | 00:00:18,000 --> 00:00:22,252
00:00:22:You ... | 00:00:22,000 --> 00:00:25,448

Looking closer, found the end time is calculated by

pysubs2/pysubs2/tmp.py

Lines 50 to 51 in fc53473

 #calculate endtime from starttime + 500 miliseconds + 67 miliseconds per each character (15 chars per second) 

 end = start + 500 + (len(line) * 67)

Was that intentional? It seems to make more sense to make the end time not go beyond the start time of the next cue during the calculation.

ass to srt issue

Hello, I found that ass transcoding to .srt fails on deleting/bypassing karakoke subs and drawings {\an5\pos(655.758,142.500) /blur

I have attached the transcoded file and images on how it looks on video
Archive.zip

Suggestion: Time-based cutting utility

For stuff like cleaning audio transcript datasets, it's necessary to cut out segments of the corresponding subtitles when cutting out bad parts of the training audio. This is partially doable by merging the subtitles into an mkv container with the audio, and then using ffmpeg on it and splitting them apart again, but is far from ideal.

Having an easy way to just operate on the subtitles with an api like subs.cut(start="30:30", end="40:20"), which would remove the offending section and then shift everything after down would be really nice for this usecase.

Better handling of files with unknown character encoding

As of 1.2.0, we default to UTF-8 encoding. If this is not correct, the user has to specify the proper encoding manually. To improve the experience, we could try some autodetection before bailing out, to improve UX.

This is already something that users are dealing with, see:

Consider adding https://github.com/chardet/chardet as (optional?) dependency.

(This is another idea from the original pysubs library.)

Support `X-TIMESTAMP-MAP` for WebVTT

On the vtt subtitle fragments, when opened in the subtitle edit, it does a correction/conversion, showing the "translated" times, but with pysubs2, as well with ffmpeg or another program where I can do a simple direct conversion from vtt to srt, this "translation" of the respective times does not happen... I tried to find the logic behind it and noticed that there is a delay based on the MPEGTS value, found at the beginning of the subtitle... does it have anything to do with it?

I am attaching a file where you can see this problem that I mention
file-1_vtt.zip

Missing release tags in git

Only 0.2.0 and 0.2.1 are tagged. I see commit messages referencing 0.2.2 and 0.2.3 but am not sure exactly which commit was used to generate the corresponding packages on the cheeseshop. Could you please push the tags up? Thanks!

shift time less than 0 go to somewhere

Hi, here other:

[Script Info]
; Script generated by Aegisub 3.2.2
; http://www.aegisub.org/
Title: karaoke
ScriptType: v4.00+
WrapStyle: 0
ScaledBorderAndShadow: yes

[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
Style: Default,Arial,20,&H00FFFFFF,&H000088EF,&H00000000,&H00666666,-1,0,0,0,100,100,0,0,1,3,0,8,10,10,10,1

[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:00.20,0:00:03.35,Default,,0,0,0,fx,Hi

Now we load this:

>>> a=pysubs2.load("jj.ass")
>>> a[0].start
200
>>> a.shift(-300)
>>> a[0].start
-1079999800

I think this should use negative times, or a warning or error? but that number don't xD

Bye.

Very helpful tool, Thank you very much!

I am trying to process .ass subtitle file in Python3
this package save me a lot of time
Thanks!

English Language Only subs

Hi,

Last time there was an issue on the subs adding Japanese / Chinese subs to the final .srt was recommended to use the UTF-8 but it still gives out some Japanese characters like on the files attached
sample:

13
00:24:18,670 --> 00:24:22,570
呼びかけた声かき消されて

Archive 2s2.zip

CLI, default to UTF-8

Consider making --input-enc utf8 default, with some fallback. It already is default in the Python API, however as of 1.1.0 we're defaulting to ISO-8859-1 in the CLI, which obviously breaks on some inputs. See #37 (comment).

Possible integration point with openai/whisper output

Hello, I was wondering if it possible to add a new method to support whisper transcribe output and convert that to the supported output formats.

This is currently how it's done at whisper

https://github.com/openai/whisper/blob/9f70a352f9f8630ab3aa0d06af5cb9532bd8c21d/whisper/utils.py#L63

def write_srt(transcript: Iterator[dict], file: TextIO):
    """
    Write a transcript to a file in SRT format.
    Example usage:
        from pathlib import Path
        from whisper.utils import write_srt
        result = transcribe(model, audio_path, temperature=temperature, **args)
        # save SRT
        audio_basename = Path(audio_path).stem
        with open(Path(output_dir) / (audio_basename + ".srt"), "w", encoding="utf-8") as srt:
            write_srt(result["segments"], file=srt)
    """
    for i, segment in enumerate(transcript, start=1):
        # write srt lines
        print(
            f"{i}\n"
            f"{format_timestamp(segment['start'], always_include_hours=True, decimal_marker=',')} --> "
            f"{format_timestamp(segment['end'], always_include_hours=True, decimal_marker=',')}\n"
            f"{segment['text'].strip().replace('-->', '->')}\n",
            file=file,
            flush=True,
        )

I got this error: AttributeError: 'module' object has no attribute 'load'

import pysubs2
pysubs2.load('subtitle.ass','utf-8')
Traceback (most recent call last):
File "", line 1, in
AttributeError: 'module' object has no attribute 'load'

I use pysubs2 in Django

I think there maybe some problem about import path, but I can't figure it out.

could you check this?

could you check it

ASS escaping

See http://ffmpeg.org/doxygen/trunk/ass_8c_source.html#l00178

MPL2 newline parser is broken

| are newlines in MPL2, we should use return "\n".join(out) in prepare_text.

create a new ass subtitle file but not work in player at all

[Script Info]
; Script generated by pysubs2
; https://pypi.python.org/pypi/pysubs2
PlayResX: 1280
PlayResY: 720
ScriptType: v4.00+

[Aegisub Project Garbage]
Last Style Storage: Default
Video File: ?dummy:23.976000:2250:1920:1080:11:135:226:c
Video AR Value: 1.777778
Video Zoom Percent: 0.500000
Active Line: 1
Video Position: 342

[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
Style: Default,Arial,48.0,&H00FFFFFF,&H000000FF,&H00000000,&H00000000,0,0,0,0,100.0,100.0,0.0,0.0,1,2.0,0.0,8,25,25,25,1
Style: Romaji,Migu 1P,48.0,&H00FFFFFF,&H000000FF,&H00000000,&H00000000,-1,0,0,0,100.0,100.0,0.0,0.0,1,2.0,0.0,8,25,25,25,1
Style: Translation,Migu 1P,46.0,&H00FFFFFF,&H000000FF,&H00000000,&H00000000,-1,0,0,0,100.0,100.0,0.0,0.0,1,2.0,0.0,2,25,25,25,1
Style: Kanji,Migu 1P,38.0,&H00FFFFFF,&H000000FF,&H00000000,&H00000000,-1,0,0,0,100.0,100.0,0.0,0.0,1,1.8,0.0,4,25,25,25,1
Style: p,Arial,10.0,&H00FFFFFF,&H000000FF,&H00000000,&H00000000,0,0,0,0,100.0,100.0,0.0,0.0,1,0.0,0.0,7,25,25,25,1

[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:00.00,0:00:00.00,Default,,0,0,0,,My son threw a snowball at me and I instinctively blocked it with my daughter. The look of 
Dialogue: 0,0:00:02.71,0:00:02.71,Default,,0,0,0,,betrayal on her snow covered face has haunted my dreams for years

        e1 = SSAEvent()
        e1.start = pysubs2.make_time(s=secs_float)
        e1.end = pysubs2.make_time(s=total_length)
        # e1.style=style
        e1.text = scene
        subs.append(e1)
    subs.save(postvideodir+post_id+'-1.ass')

am i missing something?

Correction ms_to_frames/frames_to_ms/ms_to_str

Currently ms_to_frames and frames_to_ms does not works correctly. They can be imprecise.

I did a pullrequest on the PyonFx repos and I think it could be a good idea to do something similar with pysubs2: CoffeeStraw/PyonFX#46

I recommand you to look at these two file of my PR. These are the only one that matters for pysubs2.

pyonfx/convert.py
pyonfx/timestamps.py

In brief, here is all the method I propose to change:

ms_to_str, should add this little part for the .ass and .saa format: ass_ms = (ass_ms + 5) - (ass_ms + 5) % 10
ms_to_frames(Current version) should be something like this: ms_to_frames
frames_to_ms (Current version) should be something like this: frames_to_ms- Corrected version

Doesn't work in Python 3.11

> pysubs2.exe

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Python311\Scripts\pysubs2.exe\__main__.py", line 4, in <module>
  File "C:\Python311\Lib\site-packages\pysubs2\__init__.py", line 1, in <module>
    from .ssafile import SSAFile
  File "C:\Python311\Lib\site-packages\pysubs2\ssafile.py", line 13, in <module>
    from .formats import autodetect_format, get_format_class, get_format_identifier
  File "C:\Python311\Lib\site-packages\pysubs2\formats.py", line 4, in <module>
    from .microdvd import MicroDVDFormat
  File "C:\Python311\Lib\site-packages\pysubs2\microdvd.py", line 5, in <module>
    from .ssastyle import SSAStyle
  File "C:\Python311\Lib\site-packages\pysubs2\ssastyle.py", line 7, in <module>
    @dataclasses.dataclass(repr=False)
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\dataclasses.py", line 1211, in wrap
    return _process_class(cls, init, repr, eq, order, unsafe_hash,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\dataclasses.py", line 959, in _process_class
    cls_fields.append(_get_field(cls, name, type, kw_only))
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\dataclasses.py", line 816, in _get_field
    raise ValueError(f'mutable default {type(f.default)} for field '
ValueError: mutable default <class 'pysubs2.common.Color'> for field primarycolor is not allowed: use default_factory

Error when load srt file

Can not load srt file under Python 3.7. I tried both command line and Python interpreter, same error:

$ pysubs2 --input-enc utf-8 --output-enc utf-8 --to ass sub.srt'

Traceback (most recent call last):
  File "/Users/Yuji/miniconda/lib/python3.7/sre_parse.py", line 1021, in parse_template
    this = chr(ESCAPES[this][1])
KeyError: '\\i'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/Yuji/miniconda/bin/pysubs2", line 11, in <module>
    sys.exit(__main__())
  File "/Users/Yuji/miniconda/lib/python3.7/site-packages/pysubs2/cli.py", line 170, in __main__
    rv = cli(sys.argv[1:])
  File "/Users/Yuji/miniconda/lib/python3.7/site-packages/pysubs2/cli.py", line 100, in __call__
    self.main(argv)
  File "/Users/Yuji/miniconda/lib/python3.7/site-packages/pysubs2/cli.py", line 124, in main
    subs = SSAFile.from_file(infile, args.input_format, args.fps)
  File "/Users/Yuji/miniconda/lib/python3.7/site-packages/pysubs2/ssafile.py", line 152, in from_file
    impl.from_file(subs, fp, format_, fps=fps, **kwargs)
  File "/Users/Yuji/miniconda/lib/python3.7/site-packages/pysubs2/subrip.py", line 70, in from_file
    for (start, end), lines in zip(timestamps, following_lines)]
  File "/Users/Yuji/miniconda/lib/python3.7/site-packages/pysubs2/subrip.py", line 70, in <listcomp>
    for (start, end), lines in zip(timestamps, following_lines)]
  File "/Users/Yuji/miniconda/lib/python3.7/site-packages/pysubs2/subrip.py", line 59, in prepare_text
    s = re.sub(r"< *i *>", r"{\i1}", s)
  File "/Users/Yuji/miniconda/lib/python3.7/re.py", line 192, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "/Users/Yuji/miniconda/lib/python3.7/re.py", line 309, in _subx
    template = _compile_repl(template, pattern)
  File "/Users/Yuji/miniconda/lib/python3.7/re.py", line 300, in _compile_repl
    return sre_parse.parse_template(repl, pattern)
  File "/Users/Yuji/miniconda/lib/python3.7/sre_parse.py", line 1024, in parse_template
    raise s.error('bad escape %s' % this, len(this))
re.error: bad escape \i at position 1

webvtt fragments

how do i join fragments of a subtitle in webvtt and convert everything to srt?

ex: I have a folder with 4 subtitle files in webvtt and I need to unite them and convert to a single srt file

subtitle_V1-0.webvtt
subtitle_V1-1.webvtt
subtitle_V1-2.webvtt
subtitle_V1-3.webvtt

convert to

subtitle.srt

text retuen all the content with color and other tag in pysubs2

if i have subtitle with this text

1362
01:58:37,030 --> 01:58:50,030
<font color="#666666">TV</font><font color="#ad0303">S</font><font color="9f9f9f">text1</font>.Com</font>
<font color="#0A7AA6">.: text3  :.</font>
<font color="#0A7AA6">text</font>

how should i get that exactly that from pysubs2

<font color="#666666">TV</font><font color="#ad0303">S</font><font color="9f9f9f">text1</font>.Com</font>
<font color="#0A7AA6">.: text3  :.</font>
<font color="#0A7AA6">text</font>

if i call .text it remove all font tags color that i don't want.

	#calculate endtime from starttime + 500 miliseconds + 67 miliseconds per each character (15 chars per second)
	end = start + 500 + (len(line) * 67)

tkarabela / pysubs2 Goto Github PK

pysubs2's Introduction

pysubs2

pysubs2's People

Contributors

Stargazers

Watchers

Forkers

pysubs2's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs