GithubHelp home page GithubHelp logo

tkarabela / pysubs2 Goto Github PK

View Code? Open in Web Editor NEW
274.0 9.0 38.0 2.71 MB

A Python library for editing subtitle files

Home Page: http://pysubs2.readthedocs.io

License: MIT License

Python 99.68% Makefile 0.17% Shell 0.14%
python subtitles-parsing subtitles srt substation-alpha microdvd mpl2 webvtt closed-captions

pysubs2's Introduction

pysubs2

pysubs2 build master branch pysubs2 test code coverage Static Badge PyPI - Version PyPI - Status PyPI - Python Version PyPI - License GitHub Repo stars

pysubs2 is a Python library for editing subtitle files. It’s based on SubStation Alpha, the native format of Aegisub; it also supports SubRip (SRT), MicroDVD, MPL2, TMP and WebVTT formats and OpenAI Whisper captions.

There is a small CLI tool for batch conversion and retiming.

pip install pysubs2
pysubs2 --shift 0.3s *.srt
pysubs2 --to srt *.ass
import pysubs2
subs = pysubs2.load("my_subtitles.ass", encoding="utf-8")
subs.shift(s=2.5)
for line in subs:
    line.text = "{\\be1}" + line.text
subs.save("my_subtitles_edited.ass")

To learn more, please see the documentation. If you'd like to contribute, see CONTRIBUTING.md.

pysubs2 is licensed under the MIT license (see LICENSE.txt).

pysubs2's People

Contributors

antonofthewoods avatar bergerspencer avatar bkiziuk avatar erayerdin avatar interru avatar joshuaavalon avatar luk1337 avatar mikewang000000 avatar moi15moi avatar northurland avatar oczkers avatar odrling avatar palmtoptiger avatar pannal avatar tkarabela avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pysubs2's Issues

Add typehints

Write typehints as per PEP 484 for better programming experience.

&HBBGGRR format color in SSA files

I've seen some SSA files using &HBBGGRR format for PrimaryColour instead of an integer, which is just like &HAABBGGRR in ASS.
Since SSA is outdated, I couldn't tell if the format is really supported in original SSA.
Both Aegisub and VLC can recognize it as far as I know, while pysubs2 will just produce an error.

Example File:
https://secure.assrt.net/download/257308/-/1/[YYDM-11FANS][Mobile%20Suits%20Gundam%200079][43][BDRIP][960x720][X264-10bit_AAC][AD847C98].boboqiu-tc.ssa

The file was generated by SrtEdit as its header says.

Style:

[V4 Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, TertiaryColour, BackColour, Bold, Italic, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, AlphaLevel, Encoding
Style: Default,SimHei,30,&HFFFFFF,&H00FFFF,&H000000,&H000000,-1,0,1,2,3,2,20,20,20,0,1

Error:

Traceback (most recent call last):
  File "subtest.py", line 7, in <module>
    subs = pysubs2.SSAFile.load("example.ssa", encoding="utf-16")
  File "/Users/admin/.pyenv/versions/3.8.3/lib/python3.8/site-packages/pysubs2/ssafile.py", line 100, in load
    return cls.from_file(fp, format_, fps=fps, **kwargs)
  File "/Users/admin/.pyenv/versions/3.8.3/lib/python3.8/site-packages/pysubs2/ssafile.py", line 160, in from_file
    impl.from_file(subs, fp, format_, fps=fps, **kwargs)
  File "/Users/admin/.pyenv/versions/3.8.3/lib/python3.8/site-packages/pysubs2/substation.py", line 203, in from_file
    field_dict = {f: string_to_field(f, v) for f, v in zip(STYLE_FIELDS[format_], raw_fields)}
  File "/Users/admin/.pyenv/versions/3.8.3/lib/python3.8/site-packages/pysubs2/substation.py", line 203, in <dictcomp>
    field_dict = {f: string_to_field(f, v) for f, v in zip(STYLE_FIELDS[format_], raw_fields)}
  File "/Users/admin/.pyenv/versions/3.8.3/lib/python3.8/site-packages/pysubs2/substation.py", line 158, in string_to_field
    return ssa_rgb_to_color(v)
  File "/Users/admin/.pyenv/versions/3.8.3/lib/python3.8/site-packages/pysubs2/substation.py", line 72, in ssa_rgb_to_color
    x = int(s)
ValueError: invalid literal for int() with base 10: '&HFFFFFF'

[QA] DeprecationWarning in tests: 'alignment' attribute should be an Alignment instance

Pytest prints a warning:

tests/test_substation.py::test_alignment_given_as_integer
  /var/tmp/portage/dev-python/pysubs2-1.6.0/work/pysubs2-1.6.0/pysubs2/substation.py:333: DeprecationWarning: The 'alignment' attribute of SSAStyle should be an Alignment instance, using plain int is deprecated
    warnings.warn("The 'alignment' attribute of SSAStyle should be an Alignment instance, using plain int is deprecated", DeprecationWarning)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

Python version: 3.10.8
Pytest version: 7.1.3

SRT to ASS conversion cap at 10h timecode

Hi,

Thank you for the great tool, i am trying to convert an SRT file that was generated via whisper which is about 24HR stream, the conversion seems to be fine, just the timing caps at 9:59:59.99,9:59:59.99, and everything after that has the same timecode. Is there anyway to fix this? i attached the srt file, just change the .log to .srt

test.log

Can't add new styles in python2

Hi,
When adding a new style to the subtitles I get the following error:

Traceback (most recent call last):
  File "colorset.py", line 20, in <module>
    subs1.save(foutput,format_='ass')
  File "/home/mohammad/.local/lib/python2.7/site-packages/pysubs2/ssafile.py", line 190, in save
    self.to_file(fp, format_, fps=fps, **kwargs)
  File "/home/mohammad/.local/lib/python2.7/site-packages/pysubs2/ssafile.py", line 222, in to_file
    impl.to_file(self, fp, format_, fps=fps, **kwargs)
  File "/home/mohammad/.local/lib/python2.7/site-packages/pysubs2/substation.py", line 253, in to_file
    fields = [field_to_string(f, getattr(ev, f)) for f in EVENT_FIELDS[format_]]
  File "/home/mohammad/.local/lib/python2.7/site-packages/pysubs2/substation.py", line 242, in field_to_string
    raise TypeError("Unexpected type when writing a SubStation field")
TypeError: Unexpected type when writing a SubStation field

If I run the same code in python3 it runs fine but if i run it under python2 i get the error,
I'm trying to use pysubs2 in kodi which currently only support python2.
The code I used

import pysubs2
import chardet

finput='en.srt'
foutput='en.ass'
with open(finput,'rb') as fi:
	rawdata = fi.read()
	encoding = chardet.detect(rawdata)['encoding']
fi.close()

subs1 = pysubs2.load(finput)
top_style = pysubs2.SSAStyle()
top_style.alignment=8
subs1.styles['top-style'] = top_style
for line in subs1:
	line.style='top-style'
subs1.save(foutput,format_='ass')

Mixing SSA and ASS give weird results

Hi, i hope this time get a real issue, and not something of my fault..

Well, if i we insert SSA data in ASS or something like that, the code is not transformed correctly, and in the end we will can't see the subs, or part of the subs, or with other colors, etc, here all the code:

ED.ssa

[Script Info]
Title: karaoke
ScriptType: v4.00+
WrapStyle: 0
ScaledBorderAndShadow: yes
PlayResY: 600

[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, TertiaryColour, BackColour, Bold, Italic, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, AlphaLevel, Encoding
Style: OK,Britannic Bold,30,16777215,65535,65535,&H0029464b,0,0,1,2,1,2,10,10,10,0,0

[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:00.00,0:00:05.00,OK,NTP,0000,0000,0000,!Effect,Hi

KED.ass

[Script Info]
; Script generated by Aegisub 3.2.2
; http://www.aegisub.org/
Title: karaoke
ScriptType: v4.00+
WrapStyle: 0
ScaledBorderAndShadow: yes
PlayResY: 600

[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
Style: KED,Arial,20,&H00FFFFFF,&H000088EF,&H00000000,&H00666666,-1,0,0,0,100,100,0,0,1,3,0,8,10,10,10,1

[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:00.00,0:00:05.00,KED,,0,0,0,fx,Booo

code:

import pysubs2

def merge(ss):
    s1 = ss[0]
    for i_ in range(1, len(ss)):
        i = ss[i_]
        for j in i:
            s1.insert(0, j)
        s1.import_styles(i)
    return s1

a=pysubs2.load("ED.ssa")
b=pysubs2.load("KED.ass")

a=merge([a, b])
a.save("t1.ssa")
a.save("t2.ass")

This is exactly what i run, and if you play any of the two files you will find the subs are not right.

In the t1 case the colors are wrong, in the second case only is displayed KED.ass file.

Bye.

pysubs2 detecting "," in text as new param

Hi, i found this bug..., basically this:

[Script Info]
Title: Bugs in the window
ScriptType: v4.00

[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:00.00,0:00:01.00,Default,,0000,0000,0000,,Hi, i'm new here

This will works in the reproducer but will fails in pysubs2, in the text field, the "," is interpreted as a new param and fails with:

AttributeError: 'NoneType' object has no attribute 'groups'

Bye.

html tags --clean

Would be possible that under the --clean command or a new one will it delete the html tags on the subtitles for example

<i>They're dying in vain!</i>

to 

They're dying in vain!

Thanks in advance

ssaevent hashable?

Hi
So in a program I'm doing I need to do something like this
import pysubs2

subs = pysubs2.load("subtitle.ass", encoding = "utf-8")
...
if subtitle in list:
print("In.")

However, This doesn't work
TypeError: unhashable type: 'SSAEvent'
So could it become hashable?

Optional time format in SSA/ASS

Hi, i found the player can reproduce files with 0:01:19.24 and 0:01:19:24 formats, i don't know if in the specs or if should be supported here.

Bye.

Support for Graphics section

Could you consider adding support for Graphics section, please? It is lost after conversion.
Aegisub is able to add files to this section. And, if I remember correctly, VSFilterMod can render images from it.

The format of the Graphics section is almost the same as that of Fonts. Just "fontname" becomes "filename".
Specs
sample.zip

Spaces in "Style"

I have .ass generated by third-party software. (see file.zip)

This file contains spaces between values in "Style" line

Style: Default, Arial, 20, &H00FFFFFF, &H00000000, &H00000000, &H00000000, 0, 0, 0, 0, 100, 100, 0, 0, 1, 2, 0, 2, 15, 15, 15, 0

this why pysubs2 can't parse the file

\lib\site-packages\pysubs2\substation.py in <dictcomp>(.0)
    235                 buf = rest.strip().split(",")
    236                 name, raw_fields = buf[0], buf[1:] # splat workaround for Python 2.7
--> 237                 field_dict = {f: string_to_field(f, v) for f, v in zip(STYLE_FIELDS[format_], raw_fields)}
    238                 sty = SSAStyle(**field_dict)
    239                 subs.styles[name] = sty

\lib\site-packages\pysubs2\substation.py in string_to_field(f, v)
    166                     return timestamp_to_ms(TIMESTAMP.match(v).groups())
    167             elif "color" in f:
--> 168                 return rgba_to_color(v)
    169             elif f in {"bold", "underline", "italic", "strikeout"}:
    170                 return v == "-1"

\lib\site-packages\pysubs2\substation.py in rgba_to_color(s)
     72         x = int(s[2:], base=16)
     73     else:
---> 74         x = int(s)
     75     r = x & 0xff
     76     g = (x >> 8) & 0xff

ValueError: invalid literal for int() with base 10: ' &H00FFFFFF'

Also line
[v4+ Styles] starts with small letter 'v' and I have to specify 'format_' explicitly.

Probably, it will be a good idea to remove spaces from "Style" line and do invariant comparison during guess of file format?

Problem in default values, overwrited SSA to ASS

Hi, there to much things in this issue, #18, i'll split point by point, the first is this, some values in the conversion are overwrited somewhere and ignore the default values of https://github.com/tkarabela/pysubs2/blob/master/pysubs2/ssastyle.py

Here an example:

[Script Info]
Title: karaoke
ScriptType: v4.00+
WrapStyle: 0
ScaledBorderAndShadow: yes
PlayResY: 600

[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, TertiaryColour, BackColour, Bold, Italic, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, AlphaLevel, Encoding
Style: OK,Britannic Bold,30,16777215,65535,65535,&H0029464b,0,0,1,2,1,2,10,10,10,0,0

[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:00.00,0:00:05.00,OK,NTP,0000,0000,0000,!Effect,Hi

Converting to ASS:

[Script Info]
; Script generated by pysubs2
; https://pypi.python.org/pypi/pysubs2
Title: karaoke
ScriptType: v4.00+
WrapStyle: 0
ScaledBorderAndShadow: yes
PlayResY: 600

[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
Style: OK,Britannic Bold,30.0,&H00777215,&H00000535,&H00000535,&H0029464B,0,0,0,0,1.0,2.0,10.0,10.0,10,0.0,0.0,2,10,10,10,1

[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:00.00,0:00:05.00,OK,NTP,0,0,0,!Effect,Hi

Here a comparative values doesn't match:

                   SSA (original), ASS (converted)
BorderStyle 1                     10
Outline        2                      0
Shadow       1                      0
Encoding     0                      1

And there is some weird values, like ScaleX and ScaleY, in the dafult values of pysubs2 there is right values, but when write the file there is others, here other table, that values are ASS only, so should use the pysubs2 values (don't exist in SSA):

              Default (pysubs2)  Writed value (converted)
ScaleX   100                        1
ScaleY   100                        2
Spacing 0                           10
Angle     0                           10

Bye.

how to set ass style with a dict

Now , when i set ass stlye :
import pysubs2
assfile = pysubs2.SSAFile()
assfile.styles['sytle1'] = pysubs2.SSAStyle(fontname = 'xxx', fontsize = 20)

can i use a dict to work with it, just like
style_dict = {
'fontname':'xxx',
'fontsize':20
}

assfile.styles['sytle1'] = pysubs2.SSAStyle(style_dict)

Support timestamps without ms

Hey,

I had issues when reading .ass files where the timestamps do not have a millisecond attached. I suggest / propose to add this support as it can not be expected that all timestamps always have this information.

The fix is quite easy, but I can not push a branch to the repo, so I include it here:

  • file time.py:

    TIMESTAMP = re.compile(r"(\d{1,2}):(\d{2}):(\d{2})[.,]?(\d{0,3})")
    

    in function timestamp_to_ms:

        if groups[-1] == '':
            h, m, s = map(int, groups[:-1])
            ms = 0
        else:
            h, m, s, frac = map(int, groups)
            ms = frac * 10**(3 - len(groups[-1]))
    
  • The corresponding tests, test_time.py need a small update too:

        assert TIMESTAMP.match("12:45:67").groups() == ("12", "45", "67", '')
        assert TIMESTAMP.match("1:23:45,").groups() == ("1", "23", "45", '')
        assert TIMESTAMP.match("1:23:45.").groups() == ("1", "23", "45", '')
    

    and remove them from the rejected timestamps section.

I hope the changes make sense to you!

TIMESTAMP fails to match negative timestamps

It would be nice to adjust the TIMESTAMP regex so that it coped with negative timestamps rather than:

  File "/usr/lib/python3.7/site-packages/pysubs2/ssafile.py", line 92, in load
    return cls.from_file(fp, format_, fps=fps, **kwargs)
  File "/usr/lib/python3.7/site-packages/pysubs2/ssafile.py", line 152, in from_file
    impl.from_file(subs, fp, format_, fps=fps, **kwargs)
  File "/usr/lib/python3.7/site-packages/pysubs2/substation.py", line 204, in from_file
    field_dict = {f: string_to_field(f, v) for f, v in zip(EVENT_FIELDS[format_], raw_fields)}
  File "/usr/lib/python3.7/site-packages/pysubs2/substation.py", line 204, in <dictcomp>
    field_dict = {f: string_to_field(f, v) for f, v in zip(EVENT_FIELDS[format_], raw_fields)}
  File "/usr/lib/python3.7/site-packages/pysubs2/substation.py", line 148, in string_to_field
    return timestamp_to_ms(TIMESTAMP.match(v).groups())
AttributeError: 'NoneType' object has no attribute 'groups'

Granted, negative timestamps are nonsensical but Postel's Law and all that.

The particular issue that I've encountered that results in this: tp7/Sushi#34

Subtitle File To Text is omitting subscripts

I am using pysub2 to read subtitle files. Here is a little example:


SIMPLE_FILE = """
1
00:00:00,000 --> 00:01:00,000
cette matrice-là <i>E<sub>t</sub>·…·E<sub>1</sub>A</i> possède une ligne

2
00:01:00,000 --> 00:02:00,000
there was a SubRip file
with two subtitles.
"""
with open("subtitles.srt", "w", encoding="utf-8") as fp:fp.write(SIMPLE_FILE)

import pysubs2
subs = pysubs2.load("subtitles.srt",format_= "srt")
subs[0].text

As a result I get:

cette matrice-là {\i1}Et·…·E1A{\i0} possède une ligne

As you can see the symbols where successfully recognized, however the subscript was omitted.

I am wondering, if there is a method to also make sure that the subscript is there ?

bug, when parse srt

Hi,

I'm using pysubs2 to parse a existing srt file, which contains following snip. It treated 394 as text of 393. which should be two different subs with empty text.

And, it looks like subs are better separated by "A blank line " instead of timecode's line by https://www.matroska.org/technical/specs/subtitles/srt.html

Regards,
Jarod

`
392
00:29:27,46 --> 00:29:29,83
I'm Liza Minnelli..

393
00:00:00,00 --> 00:00:00,00

394
00:00:00,00 --> 00:00:00,00
`

Converting TMP into SubRip results in overlapped cues

Hi,

When trying to convert a TMP subtitle into to SubRip, I found the end time generated by pysubs2 for a subtitle cue is after the start time of the next one. E.g,

TMP                              | SubRip
00:00:12:I ...                 | 00:00:12,000 --> 00:00:15,113
00:00:14:observing ... | 00:00:14,000 --> 00:00:18,319
00:00:18:and ...           | 00:00:18,000 --> 00:00:22,252
00:00:22:You ...           | 00:00:22,000 --> 00:00:25,448

Looking closer, found the end time is calculated by

pysubs2/pysubs2/tmp.py

Lines 50 to 51 in fc53473

#calculate endtime from starttime + 500 miliseconds + 67 miliseconds per each character (15 chars per second)
end = start + 500 + (len(line) * 67)

Was that intentional? It seems to make more sense to make the end time not go beyond the start time of the next cue during the calculation.

ass to srt issue

Hello, I found that ass transcoding to .srt fails on deleting/bypassing karakoke subs and drawings {\an5\pos(655.758,142.500) /blur

I have attached the transcoded file and images on how it looks on video
Archive.zip

Suggestion: Time-based cutting utility

For stuff like cleaning audio transcript datasets, it's necessary to cut out segments of the corresponding subtitles when cutting out bad parts of the training audio. This is partially doable by merging the subtitles into an mkv container with the audio, and then using ffmpeg on it and splitting them apart again, but is far from ideal.

Having an easy way to just operate on the subtitles with an api like subs.cut(start="30:30", end="40:20"), which would remove the offending section and then shift everything after down would be really nice for this usecase.

Better handling of files with unknown character encoding

As of 1.2.0, we default to UTF-8 encoding. If this is not correct, the user has to specify the proper encoding manually. To improve the experience, we could try some autodetection before bailing out, to improve UX.

This is already something that users are dealing with, see:

Consider adding https://github.com/chardet/chardet as (optional?) dependency.

(This is another idea from the original pysubs library.)

Support `X-TIMESTAMP-MAP` for WebVTT

On the vtt subtitle fragments, when opened in the subtitle edit, it does a correction/conversion, showing the "translated" times, but with pysubs2, as well with ffmpeg or another program where I can do a simple direct conversion from vtt to srt, this "translation" of the respective times does not happen... I tried to find the logic behind it and noticed that there is a delay based on the MPEGTS value, found at the beginning of the subtitle... does it have anything to do with it?

I am attaching a file where you can see this problem that I mention
file-1_vtt.zip

Missing release tags in git

Only 0.2.0 and 0.2.1 are tagged. I see commit messages referencing 0.2.2 and 0.2.3 but am not sure exactly which commit was used to generate the corresponding packages on the cheeseshop. Could you please push the tags up? Thanks!

shift time less than 0 go to somewhere

Hi, here other:

[Script Info]
; Script generated by Aegisub 3.2.2
; http://www.aegisub.org/
Title: karaoke
ScriptType: v4.00+
WrapStyle: 0
ScaledBorderAndShadow: yes

[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
Style: Default,Arial,20,&H00FFFFFF,&H000088EF,&H00000000,&H00666666,-1,0,0,0,100,100,0,0,1,3,0,8,10,10,10,1

[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:00.20,0:00:03.35,Default,,0,0,0,fx,Hi

Now we load this:

>>> a=pysubs2.load("jj.ass")
>>> a[0].start
200
>>> a.shift(-300)
>>> a[0].start
-1079999800

I think this should use negative times, or a warning or error? but that number don't xD

Bye.

English Language Only subs

Hi,

Last time there was an issue on the subs adding Japanese / Chinese subs to the final .srt was recommended to use the UTF-8 but it still gives out some Japanese characters like on the files attached
sample:

13
00:24:18,670 --> 00:24:22,570
呼びかけた声かき消されて

Archive 2s2.zip

CLI, default to UTF-8

Consider making --input-enc utf8 default, with some fallback. It already is default in the Python API, however as of 1.1.0 we're defaulting to ISO-8859-1 in the CLI, which obviously breaks on some inputs. See #37 (comment).

Possible integration point with openai/whisper output

Hello, I was wondering if it possible to add a new method to support whisper transcribe output and convert that to the supported output formats.

This is currently how it's done at whisper

https://github.com/openai/whisper/blob/9f70a352f9f8630ab3aa0d06af5cb9532bd8c21d/whisper/utils.py#L63

def write_srt(transcript: Iterator[dict], file: TextIO):
    """
    Write a transcript to a file in SRT format.
    Example usage:
        from pathlib import Path
        from whisper.utils import write_srt
        result = transcribe(model, audio_path, temperature=temperature, **args)
        # save SRT
        audio_basename = Path(audio_path).stem
        with open(Path(output_dir) / (audio_basename + ".srt"), "w", encoding="utf-8") as srt:
            write_srt(result["segments"], file=srt)
    """
    for i, segment in enumerate(transcript, start=1):
        # write srt lines
        print(
            f"{i}\n"
            f"{format_timestamp(segment['start'], always_include_hours=True, decimal_marker=',')} --> "
            f"{format_timestamp(segment['end'], always_include_hours=True, decimal_marker=',')}\n"
            f"{segment['text'].strip().replace('-->', '->')}\n",
            file=file,
            flush=True,
        )

I got this error: AttributeError: 'module' object has no attribute 'load'

import pysubs2
pysubs2.load('subtitle.ass','utf-8')
Traceback (most recent call last):
File "", line 1, in
AttributeError: 'module' object has no attribute 'load'

I use pysubs2 in Django

I think there maybe some problem about import path, but I can't figure it out.

could you check this?

could you check it

create a new ass subtitle file but not work in player at all

[Script Info]
; Script generated by pysubs2
; https://pypi.python.org/pypi/pysubs2
PlayResX: 1280
PlayResY: 720
ScriptType: v4.00+

[Aegisub Project Garbage]
Last Style Storage: Default
Video File: ?dummy:23.976000:2250:1920:1080:11:135:226:c
Video AR Value: 1.777778
Video Zoom Percent: 0.500000
Active Line: 1
Video Position: 342

[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
Style: Default,Arial,48.0,&H00FFFFFF,&H000000FF,&H00000000,&H00000000,0,0,0,0,100.0,100.0,0.0,0.0,1,2.0,0.0,8,25,25,25,1
Style: Romaji,Migu 1P,48.0,&H00FFFFFF,&H000000FF,&H00000000,&H00000000,-1,0,0,0,100.0,100.0,0.0,0.0,1,2.0,0.0,8,25,25,25,1
Style: Translation,Migu 1P,46.0,&H00FFFFFF,&H000000FF,&H00000000,&H00000000,-1,0,0,0,100.0,100.0,0.0,0.0,1,2.0,0.0,2,25,25,25,1
Style: Kanji,Migu 1P,38.0,&H00FFFFFF,&H000000FF,&H00000000,&H00000000,-1,0,0,0,100.0,100.0,0.0,0.0,1,1.8,0.0,4,25,25,25,1
Style: p,Arial,10.0,&H00FFFFFF,&H000000FF,&H00000000,&H00000000,0,0,0,0,100.0,100.0,0.0,0.0,1,0.0,0.0,7,25,25,25,1

[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:00.00,0:00:00.00,Default,,0,0,0,,My son threw a snowball at me and I instinctively blocked it with my daughter. The look of 
Dialogue: 0,0:00:02.71,0:00:02.71,Default,,0,0,0,,betrayal on her snow covered face has haunted my dreams for years 

        e1 = SSAEvent()
        e1.start = pysubs2.make_time(s=secs_float)
        e1.end = pysubs2.make_time(s=total_length)
        # e1.style=style
        e1.text = scene
        subs.append(e1)
    subs.save(postvideodir+post_id+'-1.ass')

am i missing something?

Correction ms_to_frames/frames_to_ms/ms_to_str

Currently ms_to_frames and frames_to_ms does not works correctly. They can be imprecise.

I did a pullrequest on the PyonFx repos and I think it could be a good idea to do something similar with pysubs2: CoffeeStraw/PyonFX#46

I recommand you to look at these two file of my PR. These are the only one that matters for pysubs2.

  • pyonfx/convert.py
  • pyonfx/timestamps.py

In brief, here is all the method I propose to change:

Doesn't work in Python 3.11

> pysubs2.exe

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Python311\Scripts\pysubs2.exe\__main__.py", line 4, in <module>
  File "C:\Python311\Lib\site-packages\pysubs2\__init__.py", line 1, in <module>
    from .ssafile import SSAFile
  File "C:\Python311\Lib\site-packages\pysubs2\ssafile.py", line 13, in <module>
    from .formats import autodetect_format, get_format_class, get_format_identifier
  File "C:\Python311\Lib\site-packages\pysubs2\formats.py", line 4, in <module>
    from .microdvd import MicroDVDFormat
  File "C:\Python311\Lib\site-packages\pysubs2\microdvd.py", line 5, in <module>
    from .ssastyle import SSAStyle
  File "C:\Python311\Lib\site-packages\pysubs2\ssastyle.py", line 7, in <module>
    @dataclasses.dataclass(repr=False)
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\dataclasses.py", line 1211, in wrap
    return _process_class(cls, init, repr, eq, order, unsafe_hash,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\dataclasses.py", line 959, in _process_class
    cls_fields.append(_get_field(cls, name, type, kw_only))
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\dataclasses.py", line 816, in _get_field
    raise ValueError(f'mutable default {type(f.default)} for field '
ValueError: mutable default <class 'pysubs2.common.Color'> for field primarycolor is not allowed: use default_factory

Error when load srt file

Can not load srt file under Python 3.7. I tried both command line and Python interpreter, same error:

$ pysubs2 --input-enc utf-8 --output-enc utf-8 --to ass sub.srt'

Traceback (most recent call last):
  File "/Users/Yuji/miniconda/lib/python3.7/sre_parse.py", line 1021, in parse_template
    this = chr(ESCAPES[this][1])
KeyError: '\\i'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/Yuji/miniconda/bin/pysubs2", line 11, in <module>
    sys.exit(__main__())
  File "/Users/Yuji/miniconda/lib/python3.7/site-packages/pysubs2/cli.py", line 170, in __main__
    rv = cli(sys.argv[1:])
  File "/Users/Yuji/miniconda/lib/python3.7/site-packages/pysubs2/cli.py", line 100, in __call__
    self.main(argv)
  File "/Users/Yuji/miniconda/lib/python3.7/site-packages/pysubs2/cli.py", line 124, in main
    subs = SSAFile.from_file(infile, args.input_format, args.fps)
  File "/Users/Yuji/miniconda/lib/python3.7/site-packages/pysubs2/ssafile.py", line 152, in from_file
    impl.from_file(subs, fp, format_, fps=fps, **kwargs)
  File "/Users/Yuji/miniconda/lib/python3.7/site-packages/pysubs2/subrip.py", line 70, in from_file
    for (start, end), lines in zip(timestamps, following_lines)]
  File "/Users/Yuji/miniconda/lib/python3.7/site-packages/pysubs2/subrip.py", line 70, in <listcomp>
    for (start, end), lines in zip(timestamps, following_lines)]
  File "/Users/Yuji/miniconda/lib/python3.7/site-packages/pysubs2/subrip.py", line 59, in prepare_text
    s = re.sub(r"< *i *>", r"{\i1}", s)
  File "/Users/Yuji/miniconda/lib/python3.7/re.py", line 192, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "/Users/Yuji/miniconda/lib/python3.7/re.py", line 309, in _subx
    template = _compile_repl(template, pattern)
  File "/Users/Yuji/miniconda/lib/python3.7/re.py", line 300, in _compile_repl
    return sre_parse.parse_template(repl, pattern)
  File "/Users/Yuji/miniconda/lib/python3.7/sre_parse.py", line 1024, in parse_template
    raise s.error('bad escape %s' % this, len(this))
re.error: bad escape \i at position 1

webvtt fragments

how do i join fragments of a subtitle in webvtt and convert everything to srt?

ex: I have a folder with 4 subtitle files in webvtt and I need to unite them and convert to a single srt file

subtitle_V1-0.webvtt
subtitle_V1-1.webvtt
subtitle_V1-2.webvtt
subtitle_V1-3.webvtt

convert to

subtitle.srt

text retuen all the content with color and other tag in pysubs2

if i have subtitle with this text

1362
01:58:37,030 --> 01:58:50,030
<font color="#666666">TV</font><font color="#ad0303">S</font><font color="9f9f9f">text1</font>.Com</font>
<font color="#0A7AA6">.: text3  :.</font>
<font color="#0A7AA6">text</font>

how should i get that exactly that from pysubs2

<font color="#666666">TV</font><font color="#ad0303">S</font><font color="9f9f9f">text1</font>.Com</font>
<font color="#0A7AA6">.: text3  :.</font>
<font color="#0A7AA6">text</font>

if i call .text it remove all font tags color that i don't want.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.