GithubHelp home page GithubHelp logo

kuonanhong / urban-sound-tagging Goto Github PK

View Code? Open in Web Editor NEW

This project forked from sainathadapa/dcase2019-task5-urban-sound-tagging

0.0 0.0 0.0 3.05 MB

1st place solution to the DCASE 2019 - Task 5 - Urban Sound Tagging

Home Page: http://dcase.community/challenge2019/task-urban-sound-tagging

License: MIT License

Python 97.47% Makefile 2.53%

urban-sound-tagging's Introduction

DCASE 2019 - Task 5 - Urban Sound Tagging

This repository contains the final solution that I used for the DCASE 2019 - Task 5 - Urban Sound Tagging. The model achieved 1st position in prediction of both Coarse and Fine-level labels.

Reproducing the results

Prerequisites:

  • Linux based system
  • Python >= 3.5
  • NVidia GFX card with at least 8GB memory
  • Cuda >= 10.0
  • virtualenv package installed

Replicating:

Clone this repository. For a single command to replicate the entire solution, execute make run_all command while being in the repository directory. This command does the following steps sequentially:

  • make env: Creates a virtual environment in the current directory
  • make reqs: Installs python packages
  • make pytorch: Installs PyTorch
  • make download: Downloads the Task 5's data from Zenodo
  • make extract: Extracts the zipped files
  • make parse: Parses annotations
  • make logmel: Computes and saves Log-Mel spectrograms for all the files
  • make train_s1: Trains (system 1) model
  • make eval_s1: Conducts local evaluation of the trained model (system 1)
  • make submit_s1: Generates the submission file (system 1)
  • make train_s2: Trains (system 2) model
  • make eval_s2: Conducts local evaluation of the trained model (system 2)
  • make submit_s2: Generates the submission file (system 2)

Artifacts

The weights for both the models are available in the releases page.

About the solution

The technical report can read here, and the workshop paper is available on the DCASE proceedings page.

License

Unless otherwise stated, the contents of this repository are shared under the MIT License.

Citing

@inproceedings{Adapa2019,
    author = "Adapa, Sainath",
    title = "Urban Sound Tagging using Convolutional Neural Networks",
    booktitle = "Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019)",
    address = "New York University, NY, USA",
    month = "October",
    year = "2019",
    pages = "5--9",
    abstract = "In this paper, we propose a framework for environmental sound classification in a low-data context (less than 100 labeled examples per class). We show that using pre-trained image classification models along with usage of data augmentation techniques results in higher performance over alternative approaches. We applied this system to the task of Urban Sound Tagging, part of the DCASE 2019. The objective was to label different sources of noise from raw audio data. A modified form of MobileNetV2, a convolutional neural network (CNN) model was trained to classify both coarse and fine tags jointly. The proposed model uses log-scaled Mel-spectrogram as the representation format for the audio data. Mixup, Random erasing, scaling, and shifting are used as data augmentation techniques. A second model that uses scaled labels was built to account for human errors in the annotations. The proposed model achieved the first rank on the leaderboard with Micro-AUPRC values of 0.751 and 0.860 on fine and coarse tags, respectively."
}

urban-sound-tagging's People

Contributors

sainathadapa avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.