GithubHelp home page GithubHelp logo

n46whisper's Introduction

N46Whisper

Language : English | 简体中文

N46Whisper is a Google Colab notebook application that developed for streamlined video subtitle file generation to improve productivity of Nogizaka46 (and Sakamichi groups) subbers.

The notebook is based on faster-whisper, a reimplementation of OpenAI's Whisper , a general-prupose speech recognition model. This implementation is up to 4 times faster than original Whisper for the same accuracy while using less memory.

The output file will be in Advanced SubStation Alpha(ass) format with built-in style of selected sub group so it can be directly imported into Aegisub for subsequent editing.

What's Latest:

This projuct can only be maintained and updated irregularly due to perosonal busyness. Thank you.

2024.4.17:

  • Add option to use Google Gemini API for translation.

2024.1.31:

  • N46WhisperLite is available for daily tasks that do not need advanced settings.

2023.12.4:

  • Add support for v3 model based on faster-whisper

2023.11.7:

  • Enable users to load lastest Whisper V3 model.
  • Enable customerize beam size parameter.

How to use

  • Click here to open the notebook in Google Colab.
  • Upload file and follow the instruction to run the notebook.
  • The subtitle file will be automatically downloaded once done.

AI translation

The notebook now allow users to translate transcribed subtitle text line by line using AT translation tools.

Users can also upload local subtitle files or select files from google drive for translation.

Currently, it supports chatGPT translation.

The translated text will be append in the same line after the original text and sepearted by /N, such that a new bilingual subtitle file is generated.

For instance:

QQ截图20230312155700

An example of bilingual subtitle:

QQ截图20230312160015

To use the AI translation, users must use their own OpenAI API Key. To obtain a free Key, go to https://platform.openai.com/account/api-keys

Please note there will be limitaions on usage for free keys, choose a paid plan to speed up at your own cost.

Split lines

Users can choose to split text in a single line by space.The child lines will have same time stamp with the parent line, respectively.

For instance, for a line contains multiple long sentences:

Dialogue: 0,0:01:00.52,0:01:17.52,default,,0,0,0,,Birthday Liveについて話そうかなと思います よろしくお願いします

After split:

Dialogue: 0,0:01:00.52,0:01:17.52,default,,0,0,0,,Birthday Liveについて話そうかなと思います(adjust_required)

Dialogue: 0,0:01:00.52,0:01:17.52,default,,0,0,0,,ろしくお願いします(adjust_required)

Update history:

2023.4.30:

  • Refine the translation prompt.
  • Allow user to custom prompt and temperature for translation.
  • Display the token used and total cost for the translation task.

2023.4.15:

  • Reimplement Whsiper based on faster-whisper to improve efficiency
  • Enable vad filter that integrated within faster-whisper to improve transcribe accuracy 2023.4.10:
  • Support for select/upload multiple files to batch process.

2023.4.1:

  • Update workflow, use pysubs2 library instead of Whisper WriteSRT class for sub file manipulation.
  • Support upload srt or ass file to use AI translation function independently, support display translation progress.
  • Update documents and other minor fixes.

2023.3.15:

  • Add functions to split multiple words/sententces in one line.
  • Update documents and other minor fixes.

2023.3.12:

  • Add chatGPT translation and bilingual subtitle file generation features.
  • Update documents and other minor fixes.

2023.01.26:

  • Update scripts to reflect recent changes in Whisper.

2022.12.31:

  • Allow user to select files directly from mounted google drive.
  • Other minor fixes.

Support

The application could significantly reduce the labour and time costs of sub-groups or individual subbers. However, despite its impressive performance, the Whisper model and the application itself are not without limitations.Please read the orgininal documents and Discussions to learn more about the usage of Whisper and the common issues.

However, if you have any throughts, requests or questions that directly related to making subtitiles for Sakamichi group girls, please feel free to post here or contact me

License

The code is released under the MIT license. See License for details.

n46whisper's People

Contributors

41889732 avatar ayanaminn avatar cooperwang0912 avatar esong avatar lenshyuu227 avatar lovecany avatar machinewu avatar zimq avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

n46whisper's Issues

请问考虑在项目中加入声纹识别的功能吗?

(github怎么按了回车键直接就发出去了……我还没编辑完)
跑了个联动回,语音转文字后还需要逐行去标注说话人
然后翻到了一个声纹识别的包:
https://github.com/pyannote/pyannote-audio
应用项目:
https://github.com/yinruiqing/pyannote-whisper
https://github.com/lablab-ai/Whisper-transcription_and_diarization-speaker-identification-
https://github.com/JimLiu/whisper-podcast-subtitles

想问问考不考虑把这个集成进去OTZ
在多人联动的活动中应该能起到大作用……

小改进意见

我做过相关AI字幕的工作,不方便上传代码,但有几个方向值得借鉴:

  • 不用whisper,用whisperX,然后用whisper原始输出的字幕作为原本,扫描出准确时间轴
  • 用更智能的方式切割映像文件,只检测出里面有语音的部分,并采用不同的设置喂给whisperX
  • 将不同的whisper和wav2vec2模型的结果进行一个综合导出

最后时间轴的结果里无需人为修正的准确结果可以达到80%+

HF_TOKEN这个是 什么意思 不走了 之前都是好的

加载模型 Loading model...
/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:72: UserWarning:
The secret HF_TOKEN does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
warnings.warn(
model.bin: 100%
3.09G/3.09G [00:31<00:00, 159MB/s]

和WhisperDesktop的对比及一点想法

N46:
なんて言ったらいいんだろう
約束だよ
これさすがノーマルでしょ?
私アメリカン…あ、今はクルーザーって言うんだっけ?
whisperdesktop:
ああ… なんて言ったらいいんだろう
約束…だよ
これ さすがノーマルでしょ?
私 アメリカ… あ 今はクルザって言うんだっけ?

whisperdesktop(用的新的ggmlv2模型)的结果会显示背景音、语气词、停顿,不过它的时间轴有大问题,语音结束立刻会到下一句
N46时间轴没有问题,不会显示背景音和语气词,停顿空格也几乎没有,我用的v1模型,因为v2模型没有标点符号
什么原因导致的呢?优点能结合起来就完美了

对【实验功能】的期待

  1. 希望在【实验功能】AI文本翻译中,加入其他翻译目标语言的选项(比如英文)。
  2. 也希望在【实验功能】中,加入上传已有的ass文件,然后进行自动翻译的功能。

error429是怎么出现的?

gpt翻译我查了很长时间,所谓的error 429是每日上限值最大了,而且好像说这个东西与一个重置时间,这个时间是多少我也没找到,每次翻译的时候出现这个说的是7m12s以后重试,请问这个就是重置时间么。如果不是,请问这个问题应该如何解决和避免呢?

[enhancement]步骤优化,提高效率

你好,现在的步骤有点多,不方便。能否把步骤稍微优化一下,比如参数设置之类的,可以放到一起。
我fork了一份,做了一下修改,我对ipynb不太熟, 你可以参考。

image

关于AI翻译的部分,能否抽取成一个本地执行项目?

用Google Colab主要是想借用GPU性能……AI翻译的部分是调用远程API,基本用不上GPU算力?
能不能把上一个部分添加一个导出srt的选项,然后在自己本地跑最后一块AI翻译?
免费的API实在是太慢了,挂Google Colab的话等着等着就超时掉线了OTZ

RateLimitError

RateLimitError Traceback (most recent call last)
in translate(OOO0OO000O0OOO0OO, OO00O0OOOO00O0O00)
33 try :#line:27
---> 34 OO00OO000O0O000O0 =openai .ChatCompletion .create (model ="gpt-3.5-turbo",messages =[{"role":"user","content":f"Please help me to translate,{OO00O0OOOO00O0O00} to {OOO0OO000O0OOO0OO.language}, please return only translated content not include the origin text",}],)#line:37
35 O00O000000000O00O =(OO00OO000O0O000O0 ["choices"][0 ].get ("message").get ("content").encode ("utf8").decode ())#line:44

12 frames
RateLimitError: You exceeded your current quota, please check your plan and billing details.

During handling of the above exception, another exception occurred:

RateLimitError Traceback (most recent call last)
/usr/local/lib/python3.9/dist-packages/openai/api_requestor.py in _interpret_response_line(self, rbody, rcode, rheaders, stream)
680 stream_error = stream and "error" in resp.data
681 if stream_error or not 200 <= rcode < 300:
--> 682 raise self.handle_error_response(
683 rbody, rcode, resp.data, rheaders, stream_error=stream_error
684 )

RateLimitError: You exceeded your current quota, please check your plan and billing details.

为什么会出现这个啊 就一个两分钟的视频 而且前面日语的字幕也只扒了三句出来

能否本地使用?

发现上传谷歌云盘太慢了,也很耗梯子,能否支持直接在windows电脑上处理?

Using nllb-200 to support translation

It's a new translation model published by Meta. The model size is about 3.3GB which is runnable on Colab. Maybe it's a good idea to tranlsate with it.

export_srt设置为Yes但是没有下载srt文件(下载能否单独提取成一个单元格?)

image
image
image
图1:已经设置成yes
图2:代码是新的
图3:只收到了ass的下载请求
OTZ不清楚怎么回事……能否修改成,程序运行成功后,不在这个单元格里下载,而是把“下载”这个步骤单独抽取成一个单元格?这样的话就算下载过程中失败了,只要不刷新页面就可以再下载一次,不然想要重新下载就只能从头跑一次语音转文字……

出错&功能建议

不是做翻译的,偶然发现的笔记本,所以功能建议可以酌情考虑再决定是否增加~
使用笔记本后有两点问题:

在云盘新建文件夹voice,并上传音频文件到其中,使用笔记本读取不到voice文件夹中的音频。(我也没看见select按钮,可能是没选到音频的缘故)
使用本地上传文件,上传了三个音频文件,最后自动下载了第一个,其余被忽略了。

功能建议:

我想要批量转化音频,1k-10k这个样子,我希望可以让笔记本读取云盘内每一文件夹下的所有音频,然后按顺序批量转换。
能不能增加一个导出为txt的选项?txt只导出文本~

原本这样
--------------
[0:0:0:1:0] お願いします。
[0:0:0:1:0] お願いします。
[0:0:0:1:0] お願いします。
--------------
txt导出为这样
--------------
お願いします。
--------------
(当然,,这一点没什么必要,我也可以本地转换ass为txt,然后去除多余参数)

还有就是批量下载上千个txt有点头疼,如果能直接在colab笔记本的show code界面看到转换后的结果,那直接复制就好了,不用下载了

字幕

我想自己上传字幕让它翻译的话,就是除了翻译字幕以外,不需要其他的功能,代码要在哪里,怎么改呢

字幕翻译功能好像不能用了

会提示这个 module 'openai' has no attribute 'ChatCompletion'
红线提示的代码是以下的最后一行:
except Exception as e:
# TIME LIMIT for open api , pay to reduce the waiting time
sleep_time = int(60 / self.key_len)
time.sleep(sleep_time)
print(e, f"will sleep {sleep_time} seconds")
# self.rotate_key()
openai.api_key = self.key
completion = openai.ChatCompletion.create(

自己瞎查了一通的结果是 这个好像没啥问题啊
就是接入到模型接口里去的功能吧这行代码,然后格式不就是这样的吗
问GPT然后GPT也说是对的,为啥捏

如何批量上传谷歌网盘文件

尝试上传多个谷歌盘的文件,似乎是只能重复生成上传的最后一个的。

以及现在上传的文件如果有包含中文名,生成之后的同名文件中文名会被去掉,不知道这个有没有解。

还有我只需要生成srt文件,似乎现在只能ass和srt同时生成,不能只生成srt

【功能提案】提供选项「是否给有空格的行分行」

■功能
在参数选择那里提供一个新的选项,「是否对存在空格的行实施分割」。用户可根据自身需要选择「是」或「否」。
如果选择了「是」,则对存在空格的行进行分割,分割后的若干行均临时采用相同时间戳,且添加了adjust_required标记提示调整时间戳避免叠轴。
效果如下
○分割前

Dialogue: 0,0:01:06.52,0:01:17.52,default,,0,0,0,,スマホ見てる時点でもう失格です 後輩メンバーから言いたい放題言われながらも必死にアピールを続け

○分割后

Dialogue: 0,0:01:06.52,0:01:17.52,default,,0,0,0,,スマホ見てる時点でもう失格です
Dialogue: 0,0:01:06.52,0:01:17.52,default,,0,0,0,,後輩メンバーから言いたい放題言われながらも必死にアピールを続け(adjust_required)

■添加该功能的理由
目前Whisper转录时可能会有好几句话放在了同一行,导致一行过长。特别是在多人说话时该情况出现频率很高。如果能拆开的话,轴只需要把分出来的同一个时间戳的若干行重新调整一下各行的时长就能解决问题了。相当于代劳了aegisub的split。

※ 因为已经写好这个功能试着跑过一遍了,如果觉得确实有必要添加该功能的话,我可以提交PR。

新增功能自动断开colab的bug

我看了下代码,你发起下载请求后,就直接调用sleep函数了,这样会导致下载请求无法发起,必须强制暂停代码运行才行,我用的是Google浏览器,其余Chromium浏览器也会出现这种情况

尝试翻译英文视频时内存爆了……

按照教程一步步走,语言写的English,内容没放视频而是抽取了整个音轨导入,执行生成时内存爆了……
加载模型 Loading model...

OutOfMemoryError Traceback (most recent call last)
in
46 torch.cuda.empty_cache()
47 print('加载模型 Loading model...')
---> 48 model = whisper.load_model(model_size)
49
50 #Transcribe

8 frames
/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py in convert(t)
985 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None,
986 non_blocking, memory_format=convert_to_format)
--> 987 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
988
989 return self._apply(convert)

OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 14.75 GiB total capacity; 13.66 GiB already allocated; 6.81 MiB free; 14.18 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

看了下额度也有,为什么就会报错呢?

During handling of the above exception, another exception occurred:

RateLimitError Traceback (most recent call last)
/usr/local/lib/python3.9/dist-packages/openai/api_requestor.py in _interpret_response_line(self, rbody, rcode, rheaders, stream)
681 stream_error = stream and "error" in resp.data
682 if stream_error or not 200 <= rcode < 300:
--> 683 raise self.handle_error_response(
684 rbody, rcode, resp.data, rheaders, stream_error=stream_error
685 )

RateLimitError: You exceeded your current quota, please check your plan and billing details.

英语分行标记存在问题

选择语言为en时不能以英文据点为分行标记分割,包括is_spilt和spilt_method等设置都不能改变输出ass文件的分行;相对而言日语的分行就很明确,本问题自4.15日更新后出现,请问是否为faster-whisper的问题?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.