GithubHelp home page GithubHelp logo

dcase2021-seld's Introduction

DCASE 2020: Sound event localization and detection (SELD) task

DATASETS

The participants can choose either of the two or both the following datasets,

  • TAU-NIGENS Spatial Sound Events 2020 - Ambisonic
  • TAU-NIGENS Spatial Sound Events 2020 - Microphone Array

These datasets contain recordings from an identical scene, with TAU-NIGENS Spatial Sound Events 2020 - Ambisonic providing four-channel First-Order Ambisonic (FOA) recordings while TAU-NIGENS Spatial Sound Events 2020 - Microphone Array provides four-channel directional microphone recordings from a tetrahedral array configuration. Both formats are extracted from the same microphone array, and additional information on the spatial characteristics of each format can be found below. The participants can choose one of the two, or both the datasets based on the audio format they prefer. Both the datasets, consists of a development and evaluation set.

The development set consists of 600, one minute long recordings sampled at 24000 Hz. All participants are expected to use the fixed splits provided in the baseline method for reporting the development scores. We use 400 recordings for training split (fold 3 to 6), 100 for validation (fold 2) and 100 for testing (fold 1). The evaluation set consists of 200, one-minute recordings, and is released during the evaluation phase of the challenge. After the end of the DCASE2020 Challenge, the unknown labels of the evaluation set are also released for further testing and comparison of methods out of the challenge, with the challenge results.

The development and evaluation datasets in both formats can be downloaded from the link

Download the dataset

Training the SELDnet

In order to quickly train SELDnet follow the steps below.

  • For the chosen dataset (Ambisonic or Microphone), download the respective zip file. This contains both the audio files and the respective metadata. Unzip the files under the same 'base_folder/', ie, if you are Ambisonic dataset, then the 'base_folder/' should have two folders - 'foa_dev/' and 'metadata_dev/' after unzipping.

  • Now update the respective dataset name and its path in parameter.py script. For the above example, you will change dataset='foa' and dataset_dir='base_folder/'. Also provide a directory path feat_label_dir in the same parameter.py script where all the features and labels will be dumped.

  • Extract features from the downloaded dataset by running the batch_feature_extraction.py script. Run the script as shown below. This will dump the normalized features and labels in the feat_label_dir folder.

python3 batch_feature_extraction.py

You can now train the SELDnet using default parameters using

python3 seld_torch.py
  • Additionally, you can add/change parameters by using a unique identifier <task-id> in if-else loop as seen in the parameter.py script and call them as following
python3 seld_torch.py <task-id> <job-id>

Where <job-id> is a unique identifier which is used for output filenames (models, training plots). You can use any number or string for this.

In order to get baseline results on the development set for Microphone array recordings, you can run the following command

python3 seld_torch.py 2

Similarly, for Ambisonic format baseline results, run the following command

python3 seld_torch.py 4
  • By default, the code runs in quick_test = True mode. This trains the network for 2 epochs on only 2 mini-batches. Once you get to run the code sucessfully, set quick_test = False in parameter.py script and train on the entire data.

  • The code also plots training curves, intermediate results and saves models in the model_dir path provided by the user in parameter.py file.

  • In order to visualize the output of SELDnet and for submission of results, set dcase_output=True and provide dcase_dir directory. This will dump file-wise results in the directory, which can be individually visualized using visualize_SELD_output.py script.

dcase2021-seld's People

Contributors

huangdaolang avatar

Stargazers

 avatar

Watchers

 avatar

dcase2021-seld's Issues

The code doesn't run

Can this code run in WIN?I have an error running on WIN.(IndexError: too many indices for array: array is 1-dimensional, but 3 were indexed)can you give me a hand,thanks

!!!

Do u have a WeChat,if u have,could u please add me WeChat,my WeChat is LDZZ1103.Thanks,bro,love U!

evaluation metrics

epoch_cnt: 0, time: 72.89s, tr_loss: 0.03,
DCASE2021 SCORES: SED_Error: 1.00, SED_F: 0.0, DOA_Error: 180.0, DOA_recall:0.0, seld_score (early stopping score): 1.00, best_seld_score: 1.00, best_epoch : 0

Evaluation metrics always stays the same. Please help me

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.