ayanaminn / n46whisper Goto Github PK

Whisper based Japanese subtitle generator

License: MIT License

Jupyter Notebook 72.18% Python 27.82%

n46whisper's Introduction

N46Whisper

Language : English | 简体中文

N46Whisper is a Google Colab notebook application that developed for streamlined video subtitle file generation to improve productivity of Nogizaka46 (and Sakamichi groups) subbers.

The notebook is based on faster-whisper, a reimplementation of OpenAI's Whisper , a general-prupose speech recognition model. This implementation is up to 4 times faster than original Whisper for the same accuracy while using less memory.

The output file will be in Advanced SubStation Alpha(ass) format with built-in style of selected sub group so it can be directly imported into Aegisub for subsequent editing.

What's Latest：

This projuct can only be maintained and updated irregularly due to perosonal busyness. Thank you.

2024.4.17:

Add option to use Google Gemini API for translation.

2024.1.31:

N46WhisperLite is available for daily tasks that do not need advanced settings.

2023.12.4:

Add support for v3 model based on faster-whisper

2023.11.7:

Enable users to load lastest Whisper V3 model.
Enable customerize beam size parameter.

How to use

Click here to open the notebook in Google Colab.
Upload file and follow the instruction to run the notebook.
The subtitle file will be automatically downloaded once done.

AI translation

The notebook now allow users to translate transcribed subtitle text line by line using AT translation tools.

Users can also upload local subtitle files or select files from google drive for translation.

Currently, it supports chatGPT translation.

The translated text will be append in the same line after the original text and sepearted by /N, such that a new bilingual subtitle file is generated.

For instance:

An example of bilingual subtitle:

To use the AI translation, users must use their own OpenAI API Key. To obtain a free Key, go to https://platform.openai.com/account/api-keys

Please note there will be limitaions on usage for free keys, choose a paid plan to speed up at your own cost.

Split lines

Users can choose to split text in a single line by space.The child lines will have same time stamp with the parent line, respectively.

For instance, for a line contains multiple long sentences:

Dialogue: 0,0:01:00.52,0:01:17.52,default,,0,0,0,,Birthday Liveについて話そうかなと思いますよろしくお願いします

After split:

Dialogue: 0,0:01:00.52,0:01:17.52,default,,0,0,0,,Birthday Liveについて話そうかなと思います(adjust_required)

Dialogue: 0,0:01:00.52,0:01:17.52,default,,0,0,0,,ろしくお願いします(adjust_required)

Update history：

2023.4.30:

Refine the translation prompt.
Allow user to custom prompt and temperature for translation.
Display the token used and total cost for the translation task.

2023.4.15:

Reimplement Whsiper based on faster-whisper to improve efficiency
Enable vad filter that integrated within faster-whisper to improve transcribe accuracy 2023.4.10:
Support for select/upload multiple files to batch process.

2023.4.1:

Update workflow, use pysubs2 library instead of Whisper WriteSRT class for sub file manipulation.
Support upload srt or ass file to use AI translation function independently, support display translation progress.
Update documents and other minor fixes.

2023.3.15:

Add functions to split multiple words/sententces in one line.
Update documents and other minor fixes.

2023.3.12:

Add chatGPT translation and bilingual subtitle file generation features.
Update documents and other minor fixes.

2023.01.26：

Update scripts to reflect recent changes in Whisper.

2022.12.31：

Allow user to select files directly from mounted google drive.
Other minor fixes.

Support

The application could significantly reduce the labour and time costs of sub-groups or individual subbers. However, despite its impressive performance, the Whisper model and the application itself are not without limitations.Please read the orgininal documents and Discussions to learn more about the usage of Whisper and the common issues.

However, if you have any throughts, requests or questions that directly related to making subtitiles for Sakamichi group girls, please feel free to post here or contact me

License

The code is released under the MIT license. See License for details.

n46whisper's People

Contributors

Stargazers

Watchers

Forkers

pry666 usagi539 roihn kusamoto-horsehead ruixingw mefistooo sunny635 esong bushnerd lenshyuu227 ysmoe ifeimi nishinokaede gianozdp bluebirdback cellinlab gzzhengbingyi coffee-tang lovecany weiplanet jnightlee shuidong fangmy1993 lanceshih linuer heavengod techthiyanes qinroc zephyros-knights dkmeteor flyranger bright-w tjxj kae-mihara syntaxslinger bapuqln ecafe8 hhy5277 intime3000 phillipliang niidy cj0596 hhccwang inswan freezing111 qdb048sta waluxs 934076245 heqins augerjk bzcrl okanikubay rainmiku skleongncku nillisgit babykidp lovegotobe maeganyork jasonchang0905 m0wong fakedon cybort xunmeiruxue 41889732 layle0 junartee nanitool noti0na1 keyman9848 dao123x dongmaicle moeloli fish12121 liangofthechen chinrw fullshinetw cgnerds yazamaanyu jeejeeguan stu92054 evdcush kyushuadamu jevancc oscillight ricecakeovo trump888 charliecho2017 kafou52 fightseed aierlma zimq davidzhang88 forsakenrei stepwp dtlnor babbyop superoldman96 p4p4n1ck iou3344 rzyn2020

n46whisper's Issues

问一下谷歌网盘好像用不了了？

点谷歌网盘步骤后只有验证谷歌账号，没有运作。过一会后会自动执行第二步本地上传

感谢大佬无私分享，我这几天试用了几个这类字幕生成方法，您的这个最完美了。中英文混合完美识别。再次感谢您的大作，您的分享！

sorry for bothering new in github

能否添加 openai 的url链接

比如我用第三方的 openai api

请求加一下能够自定义split阈值数量的参数

https://github.com/Ayanaminn/N46Whisper/blob/efc7bcce2d952beafeb31af58a217a662c26f7f7/srt2ass.py#L81C2-L84

srt2ass的split参数这里只能设定modest (5)或者aggressive感觉还是不能适配到所有应用场景，假如能加一个参数自由调整就好了
比如说我现在烤广播的话，5还是会切出一些比较碎的轴

可以添加一个选项字幕翻译完之后只导出中文字幕吗

如题，谢谢

openai已经开放whisper的api

openai已经开放whisper的api，可不可以直接调用API来转录，似乎还支持prompt

运行提示错误

批量上传本地音频文件，只能成功生成第一个文件的字幕

Discussed in #53

^{Originally posted by YMStar May 19, 2023}
字幕生成质量不错！支持

请问考虑在项目中加入声纹识别的功能吗？

~~（github怎么按了回车键直接就发出去了……我还没编辑完）~~
跑了个联动回，语音转文字后还需要逐行去标注说话人
然后翻到了一个声纹识别的包：
https://github.com/pyannote/pyannote-audio
应用项目：
https://github.com/yinruiqing/pyannote-whisper
https://github.com/lablab-ai/Whisper-transcription_and_diarization-speaker-identification-
https://github.com/JimLiu/whisper-podcast-subtitles

想问问考不考虑把这个集成进去OTZ
在多人联动的活动中应该能起到大作用……

小改进意见

我做过相关AI字幕的工作，不方便上传代码，但有几个方向值得借鉴：

不用whisper，用whisperX，然后用whisper原始输出的字幕作为原本，扫描出准确时间轴
用更智能的方式切割映像文件，只检测出里面有语音的部分，并采用不同的设置喂给whisperX
将不同的whisper和wav2vec2模型的结果进行一个综合导出

最后时间轴的结果里无需人为修正的准确结果可以达到80%+

colab步骤里面没有下载ffmpeg包

需要自己在流里面添加才有

google drive的音频不显示，只显示文件夹

放在根目录下和子目录下都不显示，所以无法选择

HF_TOKEN这个是什么意思不走了之前都是好的

加载模型 Loading model...
/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:72: UserWarning:
The secret HF_TOKEN does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
warnings.warn(
model.bin: 100%
3.09G/3.09G [00:31<00:00, 159MB/s]

和WhisperDesktop的对比及一点想法

N46：
なんて言ったらいいんだろう
約束だよ
これさすがノーマルでしょ?
私アメリカン…あ、今はクルーザーって言うんだっけ?
whisperdesktop：
ああ… なんて言ったらいいんだろう
約束…だよ
これさすがノーマルでしょ?
私アメリカ… あ今はクルザって言うんだっけ?

whisperdesktop（用的新的ggmlv2模型）的结果会显示背景音、语气词、停顿，不过它的时间轴有大问题，语音结束立刻会到下一句
N46时间轴没有问题，不会显示背景音和语气词，停顿空格也几乎没有，我用的v1模型，因为v2模型没有标点符号
什么原因导致的呢？优点能结合起来就完美了

电话号码和 callmebot 的 api key 泄露了

对【实验功能】的期待

希望在【实验功能】AI文本翻译中，加入其他翻译目标语言的选项（比如英文）。
也希望在【实验功能】中，加入上传已有的ass文件，然后进行自动翻译的功能。

error429是怎么出现的？

gpt翻译我查了很长时间，所谓的error 429是每日上限值最大了，而且好像说这个东西与一个重置时间，这个时间是多少我也没找到，每次翻译的时候出现这个说的是7m12s以后重试，请问这个就是重置时间么。如果不是，请问这个问题应该如何解决和避免呢？

请问能部署到本地运行嘛

可以部署到本地嘛在线上传有点慢

我有个3G多的视频文件，翻译字幕，进度到70%多，程序就结束了，然后字幕文件也生成了

但生成的字幕文件有些没翻译，还是原来的字幕

在语音识别库配置完毕，将开始转换这一步出错

报错如下

可以给出“只导出翻译后的中文字幕”的选项吗

Discussed in #25

^{Originally posted by MrFutureV March 21, 2023}
如题

话说这个就是最新的Large V3吗？

在huggingface上找到的，应该是这个吗

能否添加自定model匯入功能

我有訓練了一些特定模型的fine-tuned版本，想說能不能在模型載入時可以選擇本地端或網路來源的model

[enhancement]步骤优化，提高效率

你好，现在的步骤有点多，不方便。能否把步骤稍微优化一下，比如参数设置之类的，可以放到一起。
我fork了一份，做了一下修改，我对ipynb不太熟，你可以参考。

第一步，登陆google账户后就网页卡住，试了好几次都这样，不知道为什么。

关于AI翻译的部分，能否抽取成一个本地执行项目？

用Google Colab主要是想借用GPU性能……AI翻译的部分是调用远程API，基本用不上GPU算力？
能不能把上一个部分添加一个导出srt的选项，然后在自己本地跑最后一块AI翻译？
免费的API实在是太慢了，挂Google Colab的话等着等着就超时掉线了OTZ

RateLimitError

RateLimitError Traceback (most recent call last)
in translate(OOO0OO000O0OOO0OO, OO00O0OOOO00O0O00)
33 try :#line:27
---> 34 OO00OO000O0O000O0 =openai .ChatCompletion .create (model ="gpt-3.5-turbo",messages =[{"role":"user","content":f"Please help me to translate,{OO00O0OOOO00O0O00} to {OOO0OO000O0OOO0OO.language}, please return only translated content not include the origin text",}],)#line:37
35 O00O000000000O00O =(OO00OO000O0O000O0 ["choices"][0 ].get ("message").get ("content").encode ("utf8").decode ())#line:44

12 frames
RateLimitError: You exceeded your current quota, please check your plan and billing details.

During handling of the above exception, another exception occurred:

RateLimitError Traceback (most recent call last)
/usr/local/lib/python3.9/dist-packages/openai/api_requestor.py in _interpret_response_line(self, rbody, rcode, rheaders, stream)
680 stream_error = stream and "error" in resp.data
681 if stream_error or not 200 <= rcode < 300:
--> 682 raise self.handle_error_response(
683 rbody, rcode, resp.data, rheaders, stream_error=stream_error
684 )

RateLimitError: You exceeded your current quota, please check your plan and billing details.

为什么会出现这个啊就一个两分钟的视频而且前面日语的字幕也只扒了三句出来

能否本地使用？

发现上传谷歌云盘太慢了，也很耗梯子，能否支持直接在windows电脑上处理？

Using nllb-200 to support translation

It's a new translation model published by Meta. The model size is about 3.3GB which is runnable on Colab. Maybe it's a good idea to tranlsate with it.

视频文件放入whisper运行时提取音频的那一步能不能也加个显示进度的进度条

虽然并不是什么问题，但是长视频看不到进度条真的有点焦虑，单单提一嘴，不加也行。
感觉也可以把视频提取音频这一功能单独抽取出来

export_srt设置为Yes但是没有下载srt文件（下载能否单独提取成一个单元格？）

图1：已经设置成yes
图2：代码是新的
图3：只收到了ass的下载请求
OTZ不清楚怎么回事……能否修改成，程序运行成功后，不在这个单元格里下载，而是把“下载”这个步骤单独抽取成一个单元格？这样的话就算下载过程中失败了，只要不刷新页面就可以再下载一次，不然想要重新下载就只能从头跑一次语音转文字……

出错&功能建议

~~不是做翻译的~~，偶然发现的笔记本，所以功能建议可以酌情考虑再决定是否增加~
使用笔记本后有两点问题：

在云盘新建文件夹voice，并上传音频文件到其中，使用笔记本读取不到voice文件夹中的音频。（我也没看见select按钮，可能是没选到音频的缘故）
使用本地上传文件，上传了三个音频文件，最后自动下载了第一个，其余被忽略了。

功能建议：

我想要批量转化音频，1k-10k这个样子，我希望可以让笔记本读取云盘内每一文件夹下的所有音频，然后按顺序批量转换。
能不能增加一个导出为txt的选项？txt只导出文本~

原本这样
--------------
[0:0:0:1:0]　お願いします。
[0:0:0:1:0]　お願いします。
[0:0:0:1:0]　お願いします。
--------------
txt导出为这样
--------------
お願いします。
--------------
（当然，，这一点没什么必要，我也可以本地转换ass为txt，然后去除多余参数）

还有就是批量下载上千个txt有点头疼，如果能直接在colab笔记本的show code界面看到转换后的结果，那直接复制就好了，不用下载了

这个sakura翻译模型的翻译效果可以接近gpt3.5，请问可以利用google colab接入运行吗？

https://github.com/SakuraLLM/Sakura-13B-Galgame

如果要做这个事情太麻烦，那就当我没说吧。。。看作者也是挺忙的。

字幕

我想自己上传字幕让它翻译的话，就是除了翻译字幕以外，不需要其他的功能，代码要在哪里，怎么改呢

MP4视频有比较严重的时间轴对不上的问题

尝试跑了一个1h+的mp4视频（源），时间轴基本上是乱的。一开始就提前了3s左右，我把所有轴都往后了3s以后，大概30s后时间轴又逐渐慢了若干秒

字幕翻译功能好像不能用了

会提示这个 module 'openai' has no attribute 'ChatCompletion'
红线提示的代码是以下的最后一行：
except Exception as e:
# TIME LIMIT for open api , pay to reduce the waiting time
sleep_time = int(60 / self.key_len)
time.sleep(sleep_time)
print(e, f"will sleep {sleep_time} seconds")
# self.rotate_key()
openai.api_key = self.key
completion = openai.ChatCompletion.create(

自己瞎查了一通的结果是这个好像没啥问题啊
就是接入到模型接口里去的功能吧这行代码，然后格式不就是这样的吗
问GPT然后GPT也说是对的，为啥捏

如何批量上传谷歌网盘文件

尝试上传多个谷歌盘的文件，似乎是只能重复生成上传的最后一个的。

以及现在上传的文件如果有包含中文名，生成之后的同名文件中文名会被去掉，不知道这个有没有解。

还有我只需要生成srt文件，似乎现在只能ass和srt同时生成，不能只生成srt

【功能提案】提供选项「是否给有空格的行分行」

■功能
在参数选择那里提供一个新的选项，「是否对存在空格的行实施分割」。用户可根据自身需要选择「是」或「否」。
如果选择了「是」，则对存在空格的行进行分割，分割后的若干行均临时采用相同时间戳，且添加了adjust_required标记提示调整时间戳避免叠轴。
效果如下
○分割前

Dialogue: 0,0:01:06.52,0:01:17.52,default,,0,0,0,,スマホ見てる時点でもう失格です後輩メンバーから言いたい放題言われながらも必死にアピールを続け

○分割后

Dialogue: 0,0:01:06.52,0:01:17.52,default,,0,0,0,,スマホ見てる時点でもう失格です
Dialogue: 0,0:01:06.52,0:01:17.52,default,,0,0,0,,後輩メンバーから言いたい放題言われながらも必死にアピールを続け(adjust_required)

■添加该功能的理由
目前Whisper转录时可能会有好几句话放在了同一行，导致一行过长。特别是在多人说话时该情况出现频率很高。如果能拆开的话，轴只需要把分出来的同一个时间戳的若干行重新调整一下各行的时长就能解决问题了。相当于代劳了aegisub的split。

※ 因为已经写好这个功能试着跑过一遍了，如果觉得确实有必要添加该功能的话，我可以提交PR。

name 'model_size' is not defined

是后台模型配置错误吗？

时间轴不准所以我自己打完轴的ass文件能不能弄过来给我转写好

本人经常制作2小时以上的电影字幕感谢大佬给出帮助

一直显示加载模型 Loading model...

新增功能自动断开colab的bug

我看了下代码，你发起下载请求后，就直接调用sleep函数了，这样会导致下载请求无法发起，必须强制暂停代码运行才行，我用的是Google浏览器，其余Chromium浏览器也会出现这种情况

[feature]设置多个音频文件，自动进行语音识别

关于AI翻译模块能否单独添加一个上传字幕文件的功能。

转录完之后会有云端的实时保存，但是ai翻译的部分坏了的话需要从whisper重新转录开始，就很麻烦。
要是ai翻译的部分可以单独添加一个上传字幕文件的功能的话会避免一些colab算力浪费。

尝试翻译英文视频时内存爆了……

按照教程一步步走，语言写的English，内容没放视频而是抽取了整个音轨导入，执行生成时内存爆了……
加载模型 Loading model...

OutOfMemoryError Traceback (most recent call last)
in
46 torch.cuda.empty_cache()
47 print('加载模型 Loading model...')
---> 48 model = whisper.load_model(model_size)
49
50 #Transcribe

8 frames
/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py in convert(t)
985 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None,
986 non_blocking, memory_format=convert_to_format)
--> 987 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
988
989 return self._apply(convert)

OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 14.75 GiB total capacity; 13.66 GiB already allocated; 6.81 MiB free; 14.18 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

看了下额度也有，为什么就会报错呢？