Comments (4)
WIth reference to the paper for the calculation of mel filterbank, I am using librosa.feature.melspectrogram() function to replace kaldi of pythorch given in inferency.py but I am not sure about how to replicate the parameters such as '25ms Hamming window every 10ms' and what would be hop_length, n_fft, win_length for librosa? Please provide the clarity.
from ast.
Hi there,
Matching outputs of Librosa and torchaudio is out of the scope of this repo, you should consult either librosa or torchaudio authors. It might be hard to make them exactly the same but I assume you should be able to get similar output with appropriate parameters. Or, you can train/fine-tune the model using the librosa generated spectrogram.
Specifically for librosa.feature.melspectrogram()
, hop_length
should be 10ms, win_length
should be 25ms, window
should be scipy.signal.windows.hann
, sr
should be 16,000, n_fft
should be 128.
-Yuan
from ast.
Ah okay, thank you for clarification.
Although what exactly should I fine-tune here to achieve the desired results as inference pipeline for audioset, I assume the window size, overlap, mel bin etc would still remain same as provided in paper?
Also fbanks calculated in torchaudio.compliance.kaldi.fbank is same as librosa.feature.melspectrogram() and python_speech_features.base.logfbank?
from ast.
So the best way is to train and test using the feature extracted by the same toolkit. For audio event classification, you can just reuse our window size, overlap, etc to save time for searching; if your task is significantly different from audio event classification, you can consider using your own parameters.
The output of different toolkits might be different, you need experiments to confirm if they are the same.
-Yuan
from ast.
Related Issues (20)
- RuntimeError: DataLoader worker (pid 39424) exited unexpectedly with exit code 1. Details are lost due to multiprocessing. Rerunning with num_workers=0 may give better error trace. HOT 1
- For own data HOT 1
- Installing requirement and CUDA on a fresh virtual environnement HOT 1
- how to use my own dataset HOT 3
- AST Audioset Training Time and Hardware HOT 2
- seq2seq classification with AST HOT 2
- After fine-tune a 3-class dataset, how to load its fine-tuned weighted to update pre-trained ast model? HOT 7
- CPU memory increase while training HOT 6
- Fine tuning AST model to Music Emotion Classification Overfit HOT 3
- How can I adapt the pretrained AST model to fit my own dataset HOT 6
- ESC-50-master zip file location has changed HOT 2
- Installing requirements issues
- When I download the pretrained model with stride=16, I need to change `fstride` and `tstride` in the source code from 10 to 16. Besides these changes, what else do I need to adjust?
- Different audio sample size for fine-tuning the model gives overfitting issue HOT 1
- training MAP HOT 2
- One question regarding the linear projection of AST. HOT 1
- Inquiry Regarding Audio Spectrogram Transformer HOT 2
- self-contained Google Colab script error HOT 2
- Ask for help HOT 1
- some questions when reproducing your results HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ast.