GithubHelp home page GithubHelp logo

Comments (7)

gaspardpetit avatar gaspardpetit commented on June 11, 2024

This issue had been reported in pyannote (pyannote/pyannote-audio#1515) by someone else, but I did not find it here.

from speechbrain.

gaspardpetit avatar gaspardpetit commented on June 11, 2024

There was also an "off by one" logic issue that I fixed. An audio clip of exactly sample_rate * large_chunk_size (i.e.. 30s) would also cause the exception to be raised.

from speechbrain.

Adel-Moumen avatar Adel-Moumen commented on June 11, 2024

Hello @gaspardpetit,

Thanks for opening this issue.

I did run your colab (thanks for the code!) and indeed, we have an error RuntimeError: Failed to decode audio.. However, when I do reproduce your issue on my compute cluster, I'm not getting an error and I get as output tensor([]).

I will investigate more but I do suspect that the issue is related to google colab... I'll keep you updated.

from speechbrain.

asumagic avatar asumagic commented on June 11, 2024

Considering the error comes from torchaudio itself I suspect different torchaudio versions/backends might exhibit different behavior in edge cases, and not that the issue stems from a misconfiguration or an upstream bug per se.
In particular, I suspect that unusual frame_offset/num_frames values in the torchaudio.load could cause something like that.

from speechbrain.

Adel-Moumen avatar Adel-Moumen commented on June 11, 2024

Considering the error comes from torchaudio itself I suspect different torchaudio versions/backends might exhibit different behavior in edge cases, and not that the issue stems from a misconfiguration or an upstream bug per se. In particular, I suspect that unusual frame_offset/num_frames values in the torchaudio.load could cause something like that.

Interesting.

Could you please @gaspardpetit share with us your pip configuration and your ffmpeg version ?

from speechbrain.

Adel-Moumen avatar Adel-Moumen commented on June 11, 2024

Note: It would be great to have in SpeechBrain a script (e.g. get_config.sh), that automatically fetches all the relevant information for us SB devs. What do you think @asumagic ?

from speechbrain.

gaspardpetit avatar gaspardpetit commented on June 11, 2024

Thanks for looking into this. I doubt this is related to ffmpeg, if you look at the sample on https://colab.research.google.com/drive/1eHvZPpIdMJNzlDQIkFgrkQZSjVyhkHPU#scrollTo=fDQ0rwDUGYXK it uses raw audio and doesn't seem to depend on ffmpeg.

Additionally, the fix https://github.com/speechbrain/speechbrain/pull/2335/files consists in checking if this is the last chunk before processing the chunk rather than after. When done the way it is currently done, the loop will always run twice even if the first chunk would have been the last. There was also an off by one by using > rather than >=. I am more puzzled about why it would work on some versions of torchaudio, since to me the error is clearly in speechbrain/inference/VAD.py

from speechbrain.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.