GithubHelp home page GithubHelp logo

Comments (4)

AlexJian1086 avatar AlexJian1086 commented on July 17, 2024

WIth reference to the paper for the calculation of mel filterbank, I am using librosa.feature.melspectrogram() function to replace kaldi of pythorch given in inferency.py but I am not sure about how to replicate the parameters such as '25ms Hamming window every 10ms' and what would be hop_length, n_fft, win_length for librosa? Please provide the clarity.

from ast.

YuanGongND avatar YuanGongND commented on July 17, 2024

Hi there,

Matching outputs of Librosa and torchaudio is out of the scope of this repo, you should consult either librosa or torchaudio authors. It might be hard to make them exactly the same but I assume you should be able to get similar output with appropriate parameters. Or, you can train/fine-tune the model using the librosa generated spectrogram.

Specifically for librosa.feature.melspectrogram(), hop_length should be 10ms, win_length should be 25ms, window should be scipy.signal.windows.hann, sr should be 16,000, n_fft should be 128.

-Yuan

from ast.

AlexJian1086 avatar AlexJian1086 commented on July 17, 2024

Ah okay, thank you for clarification.
Although what exactly should I fine-tune here to achieve the desired results as inference pipeline for audioset, I assume the window size, overlap, mel bin etc would still remain same as provided in paper?

Also fbanks calculated in torchaudio.compliance.kaldi.fbank is same as librosa.feature.melspectrogram() and python_speech_features.base.logfbank?

from ast.

YuanGongND avatar YuanGongND commented on July 17, 2024

So the best way is to train and test using the feature extracted by the same toolkit. For audio event classification, you can just reuse our window size, overlap, etc to save time for searching; if your task is significantly different from audio event classification, you can consider using your own parameters.

The output of different toolkits might be different, you need experiments to confirm if they are the same.

-Yuan

from ast.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.