First of all, thank you very much for your updates and maintenance of this excellent p

Any micro change in audio affects in different tranions. Use <

Any micro change in audio affects in different tranions. <p dir

Btw, check if VAD didn't removed that segment. <p dir="

missing segments,about purfview/whisper-standalone-win

Comments (30)

Purfview commented on June 1, 2024

Recently, I have noticed that when using the default beam size to transcribe certain files, there are occasional occurrences of missing segments, typically around 30 seconds.

Can you share an audio sample with the issue?

from whisper-standalone-win.

gkngkngkn commented on June 1, 2024

Recently, I have noticed that when using the default beam size to transcribe certain files, there are occasional occurrences of missing segments, typically around 30 seconds.

Can you share an audio sample with the issue?

Strange thing is, when I transcribed the entire video file into an MP3 ready to share, and then transcribed it again with the default beam size before uploading, the missing text reappeared.

Directly transcribing the video and transcribing the video after converting it to MP3 yield different results. Then I transcribe this video directly, by split and cut the video:

when I cut the video by retaining the 30 seconds before the missing part and then transcribe it again, the previously missing text still appears. But when I keep the complete portion before the missing part and only cut off the later part, the text still remains missing.

from whisper-standalone-win.

Purfview commented on June 1, 2024

Any micro change in audio affects in different transcriptions.

Use verbose=true and make a screenshot of a console at the time where it's missing.

from whisper-standalone-win.

gkngkngkn commented on June 1, 2024

Any micro change in audio affects in different transcriptions.

Use verbose=true and make a screenshot of a console at the time where it's missing.

the missing part is 13:05.660 to 13:36.640

from whisper-standalone-win.

Purfview commented on June 1, 2024

Yeah, it's just missing, model sometimes just refuse to output anything on some parts, I've seen example where it doesn't transcribe particular speaker.

Btw, check if VAD didn't removed that segment.

from whisper-standalone-win.

gkngkngkn commented on June 1, 2024

Btw, check if VAD didn't removed that segment.

I guess not but i will check it, because when I raised bs to 15,the missing part appears.(in the above segment, even when I increased the beam size to 10, the missing issue persisted. )

Yeah, it's just missing, model sometimes just refuse to output anything on some parts, I've seen example where it doesn't transcribe particular speaker.

Is this a random occurrence with Whisper? Is increasing the beam size the only effective way to address this?

from whisper-standalone-win.

Purfview commented on June 1, 2024

I guess not but i will check it, because when I raised bs to 15, the missing part appears.

Then VAD is not the culprit.

Is this a random occurrence with Whisper?

Maybe, I dunno,

Is increasing the beam size the only effective way to address this?

Lots of things can trigger a model to transcribe differently, maybe even -bs=1 would make it appear.
Sometimes almost nothing helps, like in this case -> Missing the first 21 seconds in small.en and large-v2

from whisper-standalone-win.

Purfview commented on June 1, 2024

Btw, I've personal test version with various ffmpeg preprocessing settings, I didn't release it because I didn't found any preprocessing helpful, but maybe there are some use cases for them.

from whisper-standalone-win.

gkngkngkn commented on June 1, 2024

Btw, I've personal test version with various ffmpeg preprocessing settings, I didn't release it because I didn't found any preprocessing helpful, but maybe there are some use cases for them.

Yes, Converting video files to MP3 or WAV format and then transcribing them seems to be a solution for addressing the issue of missing text. It would be great if this processing could be built-in. :)

In addition, splitting the issue video into several parts can also solve this problem, but I don't know how to automate this task.

from whisper-standalone-win.

despairTK commented on June 1, 2024

顺便说一句，我有带有各种 ffmpeg 预处理设置的个人测试版本，我没有发布它，因为我没有发现任何预处理有用，但也许它们有一些用例。

是的，将视频文件转换为 MP3 或 WAV 格式，然后转录它们似乎是解决文本缺失问题的解决方案。如果可以内置这种处理，那就太好了。:)

此外，将问题视频拆分为几个部分也可以解决这个问题，但我不知道如何自动执行此任务。

You can transcribe in Subtitle Edit. When transcribing, Subtitle Edit will extract the audio into .WAV format and then transcribe it.

However, it may not be helpful for the problem you encountered, because some sentences will be lost in some audios. This situation occurs less in English audios and more in other language audios, such as the Portuguese I transcribed recently. This is what happens with audio. At this time, you can only try to change some transcription settings, such as --compute_type --initial_prompt auto --initial_prompt default and other parameters.

from whisper-standalone-win.

Purfview commented on June 1, 2024

Converting video files to MP3 or WAV format and then transcribing them seems to be a solution...

...extract the audio into .WAV format and then transcribe it.

Everything is converted to wav inside before doing anything, converting to wav twice is pointless. :)

from whisper-standalone-win.

gkngkngkn commented on June 1, 2024

Converting video files to MP3 or WAV format and then transcribing them seems to be a solution...

...extract the audio into .WAV format and then transcribe it.

Everything is converted to wav inside before doing anything, converting to wav twice is pointless. :)

It's eerie that when I convert this video to MP3 or WAV format to transcribe, there is indeed no occurrence of such missing situations. However, if I transcribe directly from the video, there is a loss, almost like a ghost story. :(

I use third-party software for lossless converting, and it should also be using ffmpeg.

from whisper-standalone-win.

Purfview commented on June 1, 2024

It's using ffmpeg internally already.

lossless converting

Maybe it's not completely lossless, one bit difference in wav can propagate in a way different transcription.
It fixed this particular issue, maybe it created issues at other parts or would in other audios.

Btw, different versions of ffmpeg produce different wavs, so it's not mathematically lossless in my book.

from whisper-standalone-win.

gkngkngkn commented on June 1, 2024

It's using ffmpeg internally already.

lossless converting

Maybe it's not completely lossless, one bit difference anywhere in wav can propagate in a way different transcription. It fixed this particular issue, maybe it created issues at other parts or would in other audios.

Btw, different versions of ffmpeg produce different wavs, so it's not mathematically lossless in my book.

Currently I tried, using different models, beam sizes, conversion methods, and video clipping all yield different results. It seems there is no universal method to solve this problem.
luckly this is just an incidental occurrence. :)

from whisper-standalone-win.

Purfview commented on June 1, 2024

It would be great if this processing could be built-in. :)

Here is a quick test1 build with few filters added -> https://we.tl/t-F8VG46rY6V
Maybe --ff_speechnorm is most useful(?) from these, later I'll add more. Use ffmpeg v5 to compare apples to apples.

@despairTK wasn't you who asked for a feature to specify the audio parts to transcribe? I can add that too.

from whisper-standalone-win.

despairTK commented on June 1, 2024

如果可以内置这种处理，那就太好了。:)

这是一个快速的 test1 构建，添加了一些过滤器 -> https://we.tl/t-F8VG46rY6V 也许从这些中最有用（？），稍后我会添加更多。使用 ffmpeg v5 将苹果与苹果进行比较。--ff_speechnorm

您不是要求指定要转录的音频部分的功能吗？我也可以补充一点。

Thank you very much for your special attention. I will test the beta version and give feedback.

from whisper-standalone-win.

gkngkngkn commented on June 1, 2024

It would be great if this processing could be built-in. :)

Here is a quick test1 build with few filters added -> https://we.tl/t-F8VG46rY6V Maybe --ff_speechnorm is most useful(?) from these, later I'll add more. Use ffmpeg v5 to compare apples to apples.

@despairTK wasn't you who asked for a feature to specify the audio parts to transcribe? I can add that too.

thx for ur job, but this version still don't work on that issue after testing.

from whisper-standalone-win.

Purfview commented on June 1, 2024

thx for ur job, but this version still don't work on that issue after testing.

Not sure that I understand you, with what command doesn't work?

from whisper-standalone-win.

gkngkngkn commented on June 1, 2024

thx for ur job, but this version still don't work on that issue after testing.

Not sure that I understand you, with what command doesn't work?

from whisper-standalone-win.

Purfview commented on June 1, 2024

How it "still don't work on that issue after testing" if you wasn't able to run it... 😆

Just run ffmpeg.exe in console and check its version, probably it's some very old one.

from whisper-standalone-win.

gkngkngkn commented on June 1, 2024

How it "still don't work on that issue after testing" if you wasn't able to run it... 😆

Just run ffmpeg.exe in console and check its version, probably it's some very old one.

ok i will check it thx

from whisper-standalone-win.

gkngkngkn commented on June 1, 2024

How it "still don't work on that issue after testing" if you wasn't able to run it... 😆

Just run ffmpeg.exe in console and check its version, probably it's some very old one.

Sir, I updated my ffmpeg and tried again with this command. There still seems to be text missing in THIS video. Perhaps other filters might be effective.

from whisper-standalone-win.

Purfview commented on June 1, 2024

Here is test2 build -> https://we.tl/t-6NPHtKQbpx

Check with --ff_mp3, this will pre-process audio with mp3 conversion.
Dunno if it makes a difference, but conversion is at the end of the audio pre-processing, not at start. This way it's much faster.
Anyway, if it wouldn't make that segment appear then I can make this conversion at the start.

from whisper-standalone-win.

Purfview commented on June 1, 2024

I looked at spectrals and I see that mp3 conversion on 16000Hz cut off frequencies above ~7300Hz and I think I see slight timeline distortion like audio moved to the right side.
So here is test3 -> https://we.tl/t-MFz6hagPfA , it's converting MP3 at the start - processes original audio.
Compare it to test2 transcription results.

Original:

test2:

test3:

from whisper-standalone-win.

commented on June 1, 2024

@Purfview can I have linux build of test version, I have had similar problem in many audios. sometimes there is text loss, some times it repeat itself. I generally use beam_size=5.

from whisper-standalone-win.

gkngkngkn commented on June 1, 2024

Based on the feedback from the tests, the use of test3 has significantly improved the situation of missing text in the large model compared to the original, but there are still instances of missing text in the medium model.

from whisper-standalone-win.

Purfview commented on June 1, 2024

Based on the feedback from the tests, the use of test3 has significantly improved the situation of missing text in the large model compared to the original, but there are still instances of missing text in the medium model.

So, mp3 is not some magic "filter" making whisper to work better, it just alters audio and triggers different transcription for worse or better... or maybe those other instances of missing text are because of a different issue.

Anyway, here is test4 build -> https://we.tl/t-BqvQ33LgqT
Now it has these filters ["rnndn" I find interesting, I see improvements with it] :

--ff_dump
--ff_mp3
--ff_sync
--ff_rnndn_sh
--ff_rnndn_xiph
--ff_fftdn
--ff_tempo
--ff_gate
--ff_speechnorm
--ff_loudnorm
--ff_lowhighpass

I wrote silence suppressor, thought that it would be good addition for VAD, but somehow silero works much worse with it... I'm very unhappy... maybe I need to add some artificial noise after it or something... 😧
And another possible filter to manually select some parts of audio is implemented in the last Whisper's PR, so I'll skip it.

@Purfview can I have linux build of test version, I have had similar problem in many audios. sometimes there is text loss, some times it repeat itself. I generally use beam_size=5.

Sorry, you need to wait for a non-test release. Non Windows releases I build rarely, because need to mess with VMs...

from whisper-standalone-win.

gkngkngkn commented on June 1, 2024

will continue testing them, thx for ur excellent job sir :)

from whisper-standalone-win.

commented on June 1, 2024

@Purfview that new one fixed my issue ig i will let u know if it happens again, switched to w11 to use the new one ;D

from whisper-standalone-win.

Purfview commented on June 1, 2024

About audio filters post there: #178

from whisper-standalone-win.

missing segments about whisper-standalone-win HOT 30 CLOSED

Comments (30)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs