Describe the bug When processing a short audio clip with fewer tha

This issue had been reported in pyannote (<a class="issue-link js-issue-link" data-err

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

Considering the error comes from torchaudio</co

Note: It would be great to have in SpeechBrain a (e.g. get_config.sh), that aut

RuntimeError when processing VAD on short audio about speechbrain HOT 7 CLOSED

gaspardpetit commented on June 11, 2024

RuntimeError when processing VAD on short audio

from speechbrain.

Comments (7)

gaspardpetit commented on June 11, 2024

This issue had been reported in pyannote (pyannote/pyannote-audio#1515) by someone else, but I did not find it here.

from speechbrain.

gaspardpetit commented on June 11, 2024

There was also an "off by one" logic issue that I fixed. An audio clip of exactly sample_rate * large_chunk_size (i.e.. 30s) would also cause the exception to be raised.

from speechbrain.

Adel-Moumen commented on June 11, 2024

Hello @gaspardpetit,

Thanks for opening this issue.

I did run your colab (thanks for the code!) and indeed, we have an error RuntimeError: Failed to decode audio.. However, when I do reproduce your issue on my compute cluster, I'm not getting an error and I get as output tensor([]).

I will investigate more but I do suspect that the issue is related to google colab... I'll keep you updated.

from speechbrain.

asumagic commented on June 11, 2024

Considering the error comes from torchaudio itself I suspect different torchaudio versions/backends might exhibit different behavior in edge cases, and not that the issue stems from a misconfiguration or an upstream bug per se.
In particular, I suspect that unusual frame_offset/num_frames values in the torchaudio.load could cause something like that.

from speechbrain.

Adel-Moumen commented on June 11, 2024

Considering the error comes from torchaudio itself I suspect different torchaudio versions/backends might exhibit different behavior in edge cases, and not that the issue stems from a misconfiguration or an upstream bug per se. In particular, I suspect that unusual frame_offset/num_frames values in the torchaudio.load could cause something like that.

Interesting.

Could you please @gaspardpetit share with us your pip configuration and your ffmpeg version ?

from speechbrain.

Adel-Moumen commented on June 11, 2024

Note: It would be great to have in SpeechBrain a script (e.g. get_config.sh), that automatically fetches all the relevant information for us SB devs. What do you think @asumagic ?

from speechbrain.

gaspardpetit commented on June 11, 2024

Thanks for looking into this. I doubt this is related to ffmpeg, if you look at the sample on https://colab.research.google.com/drive/1eHvZPpIdMJNzlDQIkFgrkQZSjVyhkHPU#scrollTo=fDQ0rwDUGYXK it uses raw audio and doesn't seem to depend on ffmpeg.

Additionally, the fix https://github.com/speechbrain/speechbrain/pull/2335/files consists in checking if this is the last chunk before processing the chunk rather than after. When done the way it is currently done, the loop will always run twice even if the first chunk would have been the last. There was also an off by one by using > rather than >=. I am more puzzled about why it would work on some versions of torchaudio, since to me the error is clearly in speechbrain/inference/VAD.py

from speechbrain.

Recommend Projects

RuntimeError when processing VAD on short audio about speechbrain HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs