GithubHelp home page GithubHelp logo

Comments (1)

RetroCirce avatar RetroCirce commented on June 30, 2024

Yeah, this function is a temporary function, as you might know that AudioSet has released a small subset with strong localization labels last year. So I processed the data in the company's server for later use, but now I could not access it.

I think doing the localization on AudioSet is different from DESED, there are two differences I would suggest you need to write your own code for processing it:

  1. if you want to train a new HST-AT model by localization data (my HTS-AT can support it but I did not write it), you need to extract different output of HST-AT (I believe it is the last second layer feature-map output), and have a loss function to converge it. Actually this might become a new work. One thing to keep in mind is that the interpolation and resolution of the output may be different from the input localization time resolution ----- in that you need to find a way to align them.
  2. If you want to evaluate the model on localization dataset, fl_evaluate.py can be served as a code-base but you need to revise something:
    (1) AudioSet's classes are different from DESED's, you can see I do a map from 527 classes to 10 classes in DESED. In AudioSet, I think it is more easy since you don't need to do the map again.
    (2) Somewhere in the fl_evaluate.py: there are some fixed numbers of different thresholds for determining different classes. If you read some localization papers, you might know that different classes might have different thresholds (not all classes are 0.5) to be determined. Usually the thresholds are obtained from "inferring" on training dataset, and doing the quantization (for me, the 0.1-quantization), and then you can use these thresholds to infer on the evaluation data. So you might need to calculate the threshold of AudioSet classes by yourself.

Please let me know if you could get more results from localization performance on HST-AT, which is one unfinished work and valuable work of HTS-AT in the future.

from hts-audio-transformer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.