Comments (1)
Yeah, this function is a temporary function, as you might know that AudioSet has released a small subset with strong localization labels last year. So I processed the data in the company's server for later use, but now I could not access it.
I think doing the localization on AudioSet is different from DESED, there are two differences I would suggest you need to write your own code for processing it:
- if you want to train a new HST-AT model by localization data (my HTS-AT can support it but I did not write it), you need to extract different output of HST-AT (I believe it is the last second layer feature-map output), and have a loss function to converge it. Actually this might become a new work. One thing to keep in mind is that the interpolation and resolution of the output may be different from the input localization time resolution ----- in that you need to find a way to align them.
- If you want to evaluate the model on localization dataset, fl_evaluate.py can be served as a code-base but you need to revise something:
(1) AudioSet's classes are different from DESED's, you can see I do a map from 527 classes to 10 classes in DESED. In AudioSet, I think it is more easy since you don't need to do the map again.
(2) Somewhere in the fl_evaluate.py: there are some fixed numbers of different thresholds for determining different classes. If you read some localization papers, you might know that different classes might have different thresholds (not all classes are 0.5) to be determined. Usually the thresholds are obtained from "inferring" on training dataset, and doing the quantization (for me, the 0.1-quantization), and then you can use these thresholds to infer on the evaluation data. So you might need to calculate the threshold of AudioSet classes by yourself.
Please let me know if you could get more results from localization performance on HST-AT, which is one unfinished work and valuable work of HTS-AT in the future.
from hts-audio-transformer.
Related Issues (20)
- RuntimeError: Input and output sizes should be greater than 0, but got input (H: 0, W: 64) output (H: 1024, W: 64) HOT 1
- About shape of input wav HOT 1
- reporduce training on esc-50 has an error HOT 3
- Model Checkpoints HOT 5
- 训练过程报错:段错误 (核心已转储)
- Key to checkpoints in drive HOT 3
- Usage on Strongly labelled Dataset for SED HOT 2
- Validation loss metric
- cyclic window shifting in the (256,256) tensor HOT 1
- 报错内容ValueError: The provided lr scheduler "<torch.optim.lr_scheduler.LambdaLR object at 0x7fe3d759bb50>" is invalid HOT 4
- the size of the input spectrum HOT 1
- type of GPU HOT 2
- 谱图编码 HOT 1
- cannot pickle 'module' object HOT 2
- FileNotFoundError: [Errno 2] No such file or directory: 'audio_32k/1-100032-A-0.wav' HOT 2
- SEDWrapper sed_model problem HOT 1
- 框图字体咨询
- 1
- Where do I get the MD5 for the Audio Set
- RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hts-audio-transformer.