tkarabela / pysubs2 Goto Github PK
View Code? Open in Web Editor NEWA Python library for editing subtitle files
Home Page: http://pysubs2.readthedocs.io
License: MIT License
A Python library for editing subtitle files
Home Page: http://pysubs2.readthedocs.io
License: MIT License
Currently ms_to_frames and frames_to_ms does not works correctly. They can be imprecise.
I did a pullrequest on the PyonFx repos and I think it could be a good idea to do something similar with pysubs2: CoffeeStraw/PyonFX#46
I recommand you to look at these two file of my PR. These are the only one that matters for pysubs2.
In brief, here is all the method I propose to change:
ms_to_str, should add this little part for the .ass and .saa format: ass_ms = (ass_ms + 5) - (ass_ms + 5) % 10
ms_to_frames(Current version) should be something like this: ms_to_frames
frames_to_ms (Current version) should be something like this: frames_to_ms- Corrected version
Hi, i found the player can reproduce files with 0:01:19.24 and 0:01:19:24 formats, i don't know if in the specs or if should be supported here.
Bye.
Hello, I was wondering if it possible to add a new method to support whisper transcribe output and convert that to the supported output formats.
This is currently how it's done at whisper
https://github.com/openai/whisper/blob/9f70a352f9f8630ab3aa0d06af5cb9532bd8c21d/whisper/utils.py#L63
def write_srt(transcript: Iterator[dict], file: TextIO):
"""
Write a transcript to a file in SRT format.
Example usage:
from pathlib import Path
from whisper.utils import write_srt
result = transcribe(model, audio_path, temperature=temperature, **args)
# save SRT
audio_basename = Path(audio_path).stem
with open(Path(output_dir) / (audio_basename + ".srt"), "w", encoding="utf-8") as srt:
write_srt(result["segments"], file=srt)
"""
for i, segment in enumerate(transcript, start=1):
# write srt lines
print(
f"{i}\n"
f"{format_timestamp(segment['start'], always_include_hours=True, decimal_marker=',')} --> "
f"{format_timestamp(segment['end'], always_include_hours=True, decimal_marker=',')}\n"
f"{segment['text'].strip().replace('-->', '->')}\n",
file=file,
flush=True,
)
How can the framerate be set when this line results in the mentioned error?
subs.save(file_name, format_="microdvd", fps=24)
Pytest prints a warning:
tests/test_substation.py::test_alignment_given_as_integer
/var/tmp/portage/dev-python/pysubs2-1.6.0/work/pysubs2-1.6.0/pysubs2/substation.py:333: DeprecationWarning: The 'alignment' attribute of SSAStyle should be an Alignment instance, using plain int is deprecated
warnings.warn("The 'alignment' attribute of SSAStyle should be an Alignment instance, using plain int is deprecated", DeprecationWarning)
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
Python version: 3.10.8
Pytest version: 7.1.3
As of 1.2.0, we default to UTF-8 encoding. If this is not correct, the user has to specify the proper encoding manually. To improve the experience, we could try some autodetection before bailing out, to improve UX.
This is already something that users are dealing with, see:
Consider adding https://github.com/chardet/chardet as (optional?) dependency.
(This is another idea from the original pysubs
library.)
It would be nice to adjust the TIMESTAMP regex so that it coped with negative timestamps rather than:
File "/usr/lib/python3.7/site-packages/pysubs2/ssafile.py", line 92, in load
return cls.from_file(fp, format_, fps=fps, **kwargs)
File "/usr/lib/python3.7/site-packages/pysubs2/ssafile.py", line 152, in from_file
impl.from_file(subs, fp, format_, fps=fps, **kwargs)
File "/usr/lib/python3.7/site-packages/pysubs2/substation.py", line 204, in from_file
field_dict = {f: string_to_field(f, v) for f, v in zip(EVENT_FIELDS[format_], raw_fields)}
File "/usr/lib/python3.7/site-packages/pysubs2/substation.py", line 204, in <dictcomp>
field_dict = {f: string_to_field(f, v) for f, v in zip(EVENT_FIELDS[format_], raw_fields)}
File "/usr/lib/python3.7/site-packages/pysubs2/substation.py", line 148, in string_to_field
return timestamp_to_ms(TIMESTAMP.match(v).groups())
AttributeError: 'NoneType' object has no attribute 'groups'
Granted, negative timestamps are nonsensical but Postel's Law and all that.
The particular issue that I've encountered that results in this: tp7/Sushi#34
I've seen some SSA files using &HBBGGRR
format for PrimaryColour instead of an integer, which is just like &HAABBGGRR
in ASS.
Since SSA is outdated, I couldn't tell if the format is really supported in original SSA.
Both Aegisub and VLC can recognize it as far as I know, while pysubs2 will just produce an error.
The file was generated by SrtEdit as its header says.
Style:
[V4 Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, TertiaryColour, BackColour, Bold, Italic, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, AlphaLevel, Encoding
Style: Default,SimHei,30,&HFFFFFF,&H00FFFF,&H000000,&H000000,-1,0,1,2,3,2,20,20,20,0,1
Error:
Traceback (most recent call last):
File "subtest.py", line 7, in <module>
subs = pysubs2.SSAFile.load("example.ssa", encoding="utf-16")
File "/Users/admin/.pyenv/versions/3.8.3/lib/python3.8/site-packages/pysubs2/ssafile.py", line 100, in load
return cls.from_file(fp, format_, fps=fps, **kwargs)
File "/Users/admin/.pyenv/versions/3.8.3/lib/python3.8/site-packages/pysubs2/ssafile.py", line 160, in from_file
impl.from_file(subs, fp, format_, fps=fps, **kwargs)
File "/Users/admin/.pyenv/versions/3.8.3/lib/python3.8/site-packages/pysubs2/substation.py", line 203, in from_file
field_dict = {f: string_to_field(f, v) for f, v in zip(STYLE_FIELDS[format_], raw_fields)}
File "/Users/admin/.pyenv/versions/3.8.3/lib/python3.8/site-packages/pysubs2/substation.py", line 203, in <dictcomp>
field_dict = {f: string_to_field(f, v) for f, v in zip(STYLE_FIELDS[format_], raw_fields)}
File "/Users/admin/.pyenv/versions/3.8.3/lib/python3.8/site-packages/pysubs2/substation.py", line 158, in string_to_field
return ssa_rgb_to_color(v)
File "/Users/admin/.pyenv/versions/3.8.3/lib/python3.8/site-packages/pysubs2/substation.py", line 72, in ssa_rgb_to_color
x = int(s)
ValueError: invalid literal for int() with base 10: '&HFFFFFF'
On the vtt subtitle fragments, when opened in the subtitle edit, it does a correction/conversion, showing the "translated" times, but with pysubs2, as well with ffmpeg or another program where I can do a simple direct conversion from vtt to srt, this "translation" of the respective times does not happen... I tried to find the logic behind it and noticed that there is a delay based on the MPEGTS value, found at the beginning of the subtitle... does it have anything to do with it?
I am attaching a file where you can see this problem that I mention
file-1_vtt.zip
Only 0.2.0 and 0.2.1 are tagged. I see commit messages referencing 0.2.2 and 0.2.3 but am not sure exactly which commit was used to generate the corresponding packages on the cheeseshop. Could you please push the tags up? Thanks!
Hi,
I'm using pysubs2 to parse a existing srt file, which contains following snip. It treated 394 as text of 393. which should be two different subs with empty text.
And, it looks like subs are better separated by "A blank line " instead of timecode's line by https://www.matroska.org/technical/specs/subtitles/srt.html
Regards,
Jarod
`
392
00:29:27,46 --> 00:29:29,83
I'm Liza Minnelli..
393
00:00:00,00 --> 00:00:00,00
394
00:00:00,00 --> 00:00:00,00
`
Hi,
Last time there was an issue on the subs adding Japanese / Chinese subs to the final .srt was recommended to use the UTF-8 but it still gives out some Japanese characters like on the files attached
sample:
13
00:24:18,670 --> 00:24:22,570
呼びかけた声かき消されて
I am trying to process .ass
subtitle file in Python3
this package save me a lot of time
Thanks!
Hi,
When trying to convert a TMP subtitle into to SubRip, I found the end time generated by pysubs2 for a subtitle cue is after the start time of the next one. E.g,
TMP | SubRip
00:00:12:I ... | 00:00:12,000 --> 00:00:15,113
00:00:14:observing ... | 00:00:14,000 --> 00:00:18,319
00:00:18:and ... | 00:00:18,000 --> 00:00:22,252
00:00:22:You ... | 00:00:22,000 --> 00:00:25,448
Looking closer, found the end time is calculated by
Lines 50 to 51 in fc53473
Was that intentional? It seems to make more sense to make the end time not go beyond the start time of the next cue during the calculation.
I have .ass generated by third-party software. (see file.zip)
This file contains spaces between values in "Style" line
Style: Default, Arial, 20, &H00FFFFFF, &H00000000, &H00000000, &H00000000, 0, 0, 0, 0, 100, 100, 0, 0, 1, 2, 0, 2, 15, 15, 15, 0
this why pysubs2 can't parse the file
\lib\site-packages\pysubs2\substation.py in <dictcomp>(.0)
235 buf = rest.strip().split(",")
236 name, raw_fields = buf[0], buf[1:] # splat workaround for Python 2.7
--> 237 field_dict = {f: string_to_field(f, v) for f, v in zip(STYLE_FIELDS[format_], raw_fields)}
238 sty = SSAStyle(**field_dict)
239 subs.styles[name] = sty
\lib\site-packages\pysubs2\substation.py in string_to_field(f, v)
166 return timestamp_to_ms(TIMESTAMP.match(v).groups())
167 elif "color" in f:
--> 168 return rgba_to_color(v)
169 elif f in {"bold", "underline", "italic", "strikeout"}:
170 return v == "-1"
\lib\site-packages\pysubs2\substation.py in rgba_to_color(s)
72 x = int(s[2:], base=16)
73 else:
---> 74 x = int(s)
75 r = x & 0xff
76 g = (x >> 8) & 0xff
ValueError: invalid literal for int() with base 10: ' &H00FFFFFF'
Also line
[v4+ Styles]
starts with small letter 'v' and I have to specify 'format_' explicitly.
Probably, it will be a good idea to remove spaces from "Style" line and do invariant comparison during guess of file format?
Would be possible that under the --clean command or a new one will it delete the html tags on the subtitles for example
<i>They're dying in vain!</i>
to
They're dying in vain!
Thanks in advance
Hey,
I had issues when reading .ass
files where the timestamps do not have a millisecond attached. I suggest / propose to add this support as it can not be expected that all timestamps always have this information.
The fix is quite easy, but I can not push a branch to the repo, so I include it here:
file time.py
:
TIMESTAMP = re.compile(r"(\d{1,2}):(\d{2}):(\d{2})[.,]?(\d{0,3})")
in function timestamp_to_ms
:
if groups[-1] == '':
h, m, s = map(int, groups[:-1])
ms = 0
else:
h, m, s, frac = map(int, groups)
ms = frac * 10**(3 - len(groups[-1]))
The corresponding tests, test_time.py
need a small update too:
assert TIMESTAMP.match("12:45:67").groups() == ("12", "45", "67", '')
assert TIMESTAMP.match("1:23:45,").groups() == ("1", "23", "45", '')
assert TIMESTAMP.match("1:23:45.").groups() == ("1", "23", "45", '')
and remove them from the rejected timestamps section.
I hope the changes make sense to you!
Filtering and merging two .ass files of differing PlayRes's makes the subs from the lower resolution file minuscule and off center.
Write typehints as per PEP 484 for better programming experience.
Hi, here other:
[Script Info]
; Script generated by Aegisub 3.2.2
; http://www.aegisub.org/
Title: karaoke
ScriptType: v4.00+
WrapStyle: 0
ScaledBorderAndShadow: yes
[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
Style: Default,Arial,20,&H00FFFFFF,&H000088EF,&H00000000,&H00666666,-1,0,0,0,100,100,0,0,1,3,0,8,10,10,10,1
[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:00.20,0:00:03.35,Default,,0,0,0,fx,Hi
Now we load this:
>>> a=pysubs2.load("jj.ass")
>>> a[0].start
200
>>> a.shift(-300)
>>> a[0].start
-1079999800
I think this should use negative times, or a warning or error? but that number don't xD
Bye.
Now , when i set ass stlye :
import pysubs2
assfile = pysubs2.SSAFile()
assfile.styles['sytle1'] = pysubs2.SSAStyle(fontname = 'xxx', fontsize = 20)
can i use a dict to work with it, just like
style_dict = {
'fontname':'xxx',
'fontsize':20
}
assfile.styles['sytle1'] = pysubs2.SSAStyle(style_dict)
Hi,
Thank you for the great tool, i am trying to convert an SRT file that was generated via whisper which is about 24HR stream, the conversion seems to be fine, just the timing caps at 9:59:59.99,9:59:59.99
, and everything after that has the same timecode. Is there anyway to fix this? i attached the srt file, just change the .log
to .srt
Hi, i think pysub2 doesn't copy embedded fonts to new subtitle file. Can you add this feature please?
smacke/ffsubsync#126
It would be nice if pysubs2 could interpret all the cue settings: https://developer.mozilla.org/en-US/docs/Web/API/WebVTT_API#cue_settings
This would allow to correctly convert some .ass tag into cue vtt setting and vice-versa.
if i have subtitle with this text
1362
01:58:37,030 --> 01:58:50,030
<font color="#666666">TV</font><font color="#ad0303">S</font><font color="9f9f9f">text1</font>.Com</font>
<font color="#0A7AA6">.: text3 :.</font>
<font color="#0A7AA6">text</font>
how should i get that exactly that from pysubs2
<font color="#666666">TV</font><font color="#ad0303">S</font><font color="9f9f9f">text1</font>.Com</font>
<font color="#0A7AA6">.: text3 :.</font>
<font color="#0A7AA6">text</font>
if i call .text
it remove all font tags color that i don't want.
Hi, i found this bug..., basically this:
[Script Info]
Title: Bugs in the window
ScriptType: v4.00
[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:00.00,0:00:01.00,Default,,0000,0000,0000,,Hi, i'm new here
This will works in the reproducer but will fails in pysubs2, in the text field, the "," is interpreted as a new param and fails with:
AttributeError: 'NoneType' object has no attribute 'groups'
Bye.
Hi, there to much things in this issue, #18, i'll split point by point, the first is this, some values in the conversion are overwrited somewhere and ignore the default values of https://github.com/tkarabela/pysubs2/blob/master/pysubs2/ssastyle.py
Here an example:
[Script Info]
Title: karaoke
ScriptType: v4.00+
WrapStyle: 0
ScaledBorderAndShadow: yes
PlayResY: 600
[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, TertiaryColour, BackColour, Bold, Italic, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, AlphaLevel, Encoding
Style: OK,Britannic Bold,30,16777215,65535,65535,&H0029464b,0,0,1,2,1,2,10,10,10,0,0
[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:00.00,0:00:05.00,OK,NTP,0000,0000,0000,!Effect,Hi
Converting to ASS:
[Script Info]
; Script generated by pysubs2
; https://pypi.python.org/pypi/pysubs2
Title: karaoke
ScriptType: v4.00+
WrapStyle: 0
ScaledBorderAndShadow: yes
PlayResY: 600
[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
Style: OK,Britannic Bold,30.0,&H00777215,&H00000535,&H00000535,&H0029464B,0,0,0,0,1.0,2.0,10.0,10.0,10,0.0,0.0,2,10,10,10,1
[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:00.00,0:00:05.00,OK,NTP,0,0,0,!Effect,Hi
Here a comparative values doesn't match:
SSA (original), ASS (converted)
BorderStyle 1 10
Outline 2 0
Shadow 1 0
Encoding 0 1
And there is some weird values, like ScaleX and ScaleY, in the dafult values of pysubs2 there is right values, but when write the file there is others, here other table, that values are ASS only, so should use the pysubs2 values (don't exist in SSA):
Default (pysubs2) Writed value (converted)
ScaleX 100 1
ScaleY 100 2
Spacing 0 10
Angle 0 10
Bye.
Hi
So in a program I'm doing I need to do something like this
import pysubs2
subs = pysubs2.load("subtitle.ass", encoding = "utf-8")
...
if subtitle in list:
print("In.")
However, This doesn't work
TypeError: unhashable type: 'SSAEvent'
So could it become hashable?
Pytest prints a warning:
tests/test_substation.py:210: DeprecationWarning: invalid escape sequence '\k'
The mentioned line is:
pysubs2/tests/test_substation.py
Line 210 in 2378e76
Python version: 3.10.5
Pytest version: 7.1.2
[Script Info]
; Script generated by pysubs2
; https://pypi.python.org/pypi/pysubs2
PlayResX: 1280
PlayResY: 720
ScriptType: v4.00+
[Aegisub Project Garbage]
Last Style Storage: Default
Video File: ?dummy:23.976000:2250:1920:1080:11:135:226:c
Video AR Value: 1.777778
Video Zoom Percent: 0.500000
Active Line: 1
Video Position: 342
[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
Style: Default,Arial,48.0,&H00FFFFFF,&H000000FF,&H00000000,&H00000000,0,0,0,0,100.0,100.0,0.0,0.0,1,2.0,0.0,8,25,25,25,1
Style: Romaji,Migu 1P,48.0,&H00FFFFFF,&H000000FF,&H00000000,&H00000000,-1,0,0,0,100.0,100.0,0.0,0.0,1,2.0,0.0,8,25,25,25,1
Style: Translation,Migu 1P,46.0,&H00FFFFFF,&H000000FF,&H00000000,&H00000000,-1,0,0,0,100.0,100.0,0.0,0.0,1,2.0,0.0,2,25,25,25,1
Style: Kanji,Migu 1P,38.0,&H00FFFFFF,&H000000FF,&H00000000,&H00000000,-1,0,0,0,100.0,100.0,0.0,0.0,1,1.8,0.0,4,25,25,25,1
Style: p,Arial,10.0,&H00FFFFFF,&H000000FF,&H00000000,&H00000000,0,0,0,0,100.0,100.0,0.0,0.0,1,0.0,0.0,7,25,25,25,1
[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:00.00,0:00:00.00,Default,,0,0,0,,My son threw a snowball at me and I instinctively blocked it with my daughter. The look of
Dialogue: 0,0:00:02.71,0:00:02.71,Default,,0,0,0,,betrayal on her snow covered face has haunted my dreams for years
e1 = SSAEvent()
e1.start = pysubs2.make_time(s=secs_float)
e1.end = pysubs2.make_time(s=total_length)
# e1.style=style
e1.text = scene
subs.append(e1)
subs.save(postvideodir+post_id+'-1.ass')
am i missing something?
For example,
[Script Info]
Title:Ttitle
Original Script:Script
Synch Point:0
ScriptType:v4.00+
Collisions:Normal
PlayResX:640
PlayResY:360
Timer:100.0000
pysubs2 only has a empty dict in info.
import pysubs2
pysubs2.load('subtitle.ass','utf-8')
Traceback (most recent call last):
File "", line 1, in
AttributeError: 'module' object has no attribute 'load'
I use pysubs2 in Django
I think there maybe some problem about import path, but I can't figure it out.
could you check this?
could you check it
For stuff like cleaning audio transcript datasets, it's necessary to cut out segments of the corresponding subtitles when cutting out bad parts of the training audio. This is partially doable by merging the subtitles into an mkv container with the audio, and then using ffmpeg on it and splitting them apart again, but is far from ideal.
Having an easy way to just operate on the subtitles with an api like subs.cut(start="30:30", end="40:20")
, which would remove the offending section and then shift everything after down would be really nice for this usecase.
Hi,
When adding a new style to the subtitles I get the following error:
Traceback (most recent call last):
File "colorset.py", line 20, in <module>
subs1.save(foutput,format_='ass')
File "/home/mohammad/.local/lib/python2.7/site-packages/pysubs2/ssafile.py", line 190, in save
self.to_file(fp, format_, fps=fps, **kwargs)
File "/home/mohammad/.local/lib/python2.7/site-packages/pysubs2/ssafile.py", line 222, in to_file
impl.to_file(self, fp, format_, fps=fps, **kwargs)
File "/home/mohammad/.local/lib/python2.7/site-packages/pysubs2/substation.py", line 253, in to_file
fields = [field_to_string(f, getattr(ev, f)) for f in EVENT_FIELDS[format_]]
File "/home/mohammad/.local/lib/python2.7/site-packages/pysubs2/substation.py", line 242, in field_to_string
raise TypeError("Unexpected type when writing a SubStation field")
TypeError: Unexpected type when writing a SubStation field
If I run the same code in python3 it runs fine but if i run it under python2 i get the error,
I'm trying to use pysubs2 in kodi which currently only support python2.
The code I used
import pysubs2
import chardet
finput='en.srt'
foutput='en.ass'
with open(finput,'rb') as fi:
rawdata = fi.read()
encoding = chardet.detect(rawdata)['encoding']
fi.close()
subs1 = pysubs2.load(finput)
top_style = pysubs2.SSAStyle()
top_style.alignment=8
subs1.styles['top-style'] = top_style
for line in subs1:
line.style='top-style'
subs1.save(foutput,format_='ass')
Hello, I found that ass transcoding to .srt fails on deleting/bypassing karakoke subs and drawings {\an5\pos(655.758,142.500) /blur
I have attached the transcoded file and images on how it looks on video
Archive.zip
Could you consider adding support for Graphics section, please? It is lost after conversion.
Aegisub is able to add files to this section. And, if I remember correctly, VSFilterMod can render images from it.
The format of the Graphics section is almost the same as that of Fonts. Just "fontname" becomes "filename".
Specs
sample.zip
Hi, i hope this time get a real issue, and not something of my fault..
Well, if i we insert SSA data in ASS or something like that, the code is not transformed correctly, and in the end we will can't see the subs, or part of the subs, or with other colors, etc, here all the code:
ED.ssa
[Script Info]
Title: karaoke
ScriptType: v4.00+
WrapStyle: 0
ScaledBorderAndShadow: yes
PlayResY: 600
[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, TertiaryColour, BackColour, Bold, Italic, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, AlphaLevel, Encoding
Style: OK,Britannic Bold,30,16777215,65535,65535,&H0029464b,0,0,1,2,1,2,10,10,10,0,0
[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:00.00,0:00:05.00,OK,NTP,0000,0000,0000,!Effect,Hi
KED.ass
[Script Info]
; Script generated by Aegisub 3.2.2
; http://www.aegisub.org/
Title: karaoke
ScriptType: v4.00+
WrapStyle: 0
ScaledBorderAndShadow: yes
PlayResY: 600
[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
Style: KED,Arial,20,&H00FFFFFF,&H000088EF,&H00000000,&H00666666,-1,0,0,0,100,100,0,0,1,3,0,8,10,10,10,1
[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:00.00,0:00:05.00,KED,,0,0,0,fx,Booo
code:
import pysubs2
def merge(ss):
s1 = ss[0]
for i_ in range(1, len(ss)):
i = ss[i_]
for j in i:
s1.insert(0, j)
s1.import_styles(i)
return s1
a=pysubs2.load("ED.ssa")
b=pysubs2.load("KED.ass")
a=merge([a, b])
a.save("t1.ssa")
a.save("t2.ass")
This is exactly what i run, and if you play any of the two files you will find the subs are not right.
In the t1 case the colors are wrong, in the second case only is displayed KED.ass file.
Bye.
I tried a few things, but they all failed,
> pysubs2.exe
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "C:\Python311\Scripts\pysubs2.exe\__main__.py", line 4, in <module>
File "C:\Python311\Lib\site-packages\pysubs2\__init__.py", line 1, in <module>
from .ssafile import SSAFile
File "C:\Python311\Lib\site-packages\pysubs2\ssafile.py", line 13, in <module>
from .formats import autodetect_format, get_format_class, get_format_identifier
File "C:\Python311\Lib\site-packages\pysubs2\formats.py", line 4, in <module>
from .microdvd import MicroDVDFormat
File "C:\Python311\Lib\site-packages\pysubs2\microdvd.py", line 5, in <module>
from .ssastyle import SSAStyle
File "C:\Python311\Lib\site-packages\pysubs2\ssastyle.py", line 7, in <module>
@dataclasses.dataclass(repr=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\dataclasses.py", line 1211, in wrap
return _process_class(cls, init, repr, eq, order, unsafe_hash,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\dataclasses.py", line 959, in _process_class
cls_fields.append(_get_field(cls, name, type, kw_only))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\dataclasses.py", line 816, in _get_field
raise ValueError(f'mutable default {type(f.default)} for field '
ValueError: mutable default <class 'pysubs2.common.Color'> for field primarycolor is not allowed: use default_factory
how do i join fragments of a subtitle in webvtt and convert everything to srt?
ex: I have a folder with 4 subtitle files in webvtt and I need to unite them and convert to a single srt file
subtitle_V1-0.webvtt
subtitle_V1-1.webvtt
subtitle_V1-2.webvtt
subtitle_V1-3.webvtt
convert to
subtitle.srt
Loading and saving an SSAFile swaps green and blue color channels. (Reported by Eric Williams)
frame => frames
Title says it all. I'm planning to provide PR for this, which will start from 3.4 exactly.
|
are newlines in MPL2, we should use return "\n".join(out)
in prepare_text
.
I am using pysub2 to read subtitle files. Here is a little example:
SIMPLE_FILE = """
1
00:00:00,000 --> 00:01:00,000
cette matrice-là <i>E<sub>t</sub>·…·E<sub>1</sub>A</i> possède une ligne
2
00:01:00,000 --> 00:02:00,000
there was a SubRip file
with two subtitles.
"""
with open("subtitles.srt", "w", encoding="utf-8") as fp:fp.write(SIMPLE_FILE)
import pysubs2
subs = pysubs2.load("subtitles.srt",format_= "srt")
subs[0].text
As a result I get:
cette matrice-là {\i1}Et·…·E1A{\i0} possède une ligne
As you can see the symbols where successfully recognized, however the subscript was omitted.
I am wondering, if there is a method to also make sure that the subscript is there ?
Consider making --input-enc utf8
default, with some fallback. It already is default in the Python API, however as of 1.1.0 we're defaulting to ISO-8859-1 in the CLI, which obviously breaks on some inputs. See #37 (comment).
Hi, I'm using pysubs2 on Linux and I have a problem when trying to use the pysubs2.py script.
python -m pysubs2 works fine and the .py script works fine too, if I rename it.
https://gist.github.com/YamashitaRen/5846848c372369f28431
Hi @tkarabela I am using your library to do some preprocessing on some SRT files and I noticed that your code remove the ASS positioning tag (e.g. {\an7}).
Is it possibile to add a parameter to save function to keep them also in the output file?
Can not load srt file under Python 3.7. I tried both command line and Python interpreter, same error:
$ pysubs2 --input-enc utf-8 --output-enc utf-8 --to ass sub.srt'
Traceback (most recent call last):
File "/Users/Yuji/miniconda/lib/python3.7/sre_parse.py", line 1021, in parse_template
this = chr(ESCAPES[this][1])
KeyError: '\\i'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/Yuji/miniconda/bin/pysubs2", line 11, in <module>
sys.exit(__main__())
File "/Users/Yuji/miniconda/lib/python3.7/site-packages/pysubs2/cli.py", line 170, in __main__
rv = cli(sys.argv[1:])
File "/Users/Yuji/miniconda/lib/python3.7/site-packages/pysubs2/cli.py", line 100, in __call__
self.main(argv)
File "/Users/Yuji/miniconda/lib/python3.7/site-packages/pysubs2/cli.py", line 124, in main
subs = SSAFile.from_file(infile, args.input_format, args.fps)
File "/Users/Yuji/miniconda/lib/python3.7/site-packages/pysubs2/ssafile.py", line 152, in from_file
impl.from_file(subs, fp, format_, fps=fps, **kwargs)
File "/Users/Yuji/miniconda/lib/python3.7/site-packages/pysubs2/subrip.py", line 70, in from_file
for (start, end), lines in zip(timestamps, following_lines)]
File "/Users/Yuji/miniconda/lib/python3.7/site-packages/pysubs2/subrip.py", line 70, in <listcomp>
for (start, end), lines in zip(timestamps, following_lines)]
File "/Users/Yuji/miniconda/lib/python3.7/site-packages/pysubs2/subrip.py", line 59, in prepare_text
s = re.sub(r"< *i *>", r"{\i1}", s)
File "/Users/Yuji/miniconda/lib/python3.7/re.py", line 192, in sub
return _compile(pattern, flags).sub(repl, string, count)
File "/Users/Yuji/miniconda/lib/python3.7/re.py", line 309, in _subx
template = _compile_repl(template, pattern)
File "/Users/Yuji/miniconda/lib/python3.7/re.py", line 300, in _compile_repl
return sre_parse.parse_template(repl, pattern)
File "/Users/Yuji/miniconda/lib/python3.7/sre_parse.py", line 1024, in parse_template
raise s.error('bad escape %s' % this, len(this))
re.error: bad escape \i at position 1
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.