Hello RetroCirce, I've been getting familiar with your codebase and I am having issues

How to perform localization and generate heatmap with AudioSet about hts-audio-transformer HOT 1 CLOSED

retrocirce commented on June 30, 2024

How to perform localization and generate heatmap with AudioSet

from hts-audio-transformer.

Comments (1)

RetroCirce commented on June 30, 2024

Yeah, this function is a temporary function, as you might know that AudioSet has released a small subset with strong localization labels last year. So I processed the data in the company's server for later use, but now I could not access it.

I think doing the localization on AudioSet is different from DESED, there are two differences I would suggest you need to write your own code for processing it:

if you want to train a new HST-AT model by localization data (my HTS-AT can support it but I did not write it), you need to extract different output of HST-AT (I believe it is the last second layer feature-map output), and have a loss function to converge it. Actually this might become a new work. One thing to keep in mind is that the interpolation and resolution of the output may be different from the input localization time resolution ----- in that you need to find a way to align them.
If you want to evaluate the model on localization dataset, fl_evaluate.py can be served as a code-base but you need to revise something:
(1) AudioSet's classes are different from DESED's, you can see I do a map from 527 classes to 10 classes in DESED. In AudioSet, I think it is more easy since you don't need to do the map again.
(2) Somewhere in the fl_evaluate.py: there are some fixed numbers of different thresholds for determining different classes. If you read some localization papers, you might know that different classes might have different thresholds (not all classes are 0.5) to be determined. Usually the thresholds are obtained from "inferring" on training dataset, and doing the quantization (for me, the 0.1-quantization), and then you can use these thresholds to infer on the evaluation data. So you might need to calculate the threshold of AudioSet classes by yourself.

Please let me know if you could get more results from localization performance on HST-AT, which is one unfinished work and valuable work of HTS-AT in the future.

from hts-audio-transformer.

Recommend Projects

How to perform localization and generate heatmap with AudioSet about hts-audio-transformer HOT 1 CLOSED

Comments (1)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs