GithubHelp home page GithubHelp logo

mnpinto / audiotagging2019 Goto Github PK

View Code? Open in Web Editor NEW
26.0 3.0 6.0 78 KB

6th place solution to Freesound Audio Tagging 2019 kaggle competition

License: MIT License

Python 100.00%
kaggle-competition deep-learning artificial-intelligence data-science audio-tagging

audiotagging2019's Introduction

6th place solution for Freesound Audio Tagging 2019 Competition

Description of the solution: https://link.medium.com/Kv5kyHjcIX

How to use

  • Install fastai and librosa:
conda install -c pytorch -c fastai fastai
conda install librosa
  • Clone the repository:
git clone https://github.com/mnpinto/audiotagging2019.git
python run.py --n_epochs 1 --max_processors 8

If successful the script will create train_curated_png and train_noisy_png folders with the Mel spectrograms corresponding to all audio clips and train the model for 1 epochs using the default arguments. The max_processors argument will set how many processors to use to this preprocessing step. After the training is complete a folder models will be created and a weights file stage-1.pth will be saved their. Finally a submission file will be generated with the default name submission.csv.

If you find any errors let me know by creating an Issue, the code has not yet been tested on fastai versions after 1.0.51.

Arguments

name type default description
--path str data path to data folder
--working_path str . path to working folder where model weights and outputs will be saved
--base_dim int 128 size to crop the images on the horizontal axis before rescaling with SZ
--SZ int 128 images will be rescaled to SZxSZ
--BS int 64 batch size
--lr float 0.01 maximum learning rate for one_cycle_learning
--n_epochs int 80 number of epochs to train the model
--epoch_size int 1000 number of episodes (with batch size BS each) in each epoch
--f2cl int 1 train only on samples with F2 score (with threshold of 0.2) less than f2cl
--fold_number int 0 KFold cross-validation fold number: (0,1,2,3,4) or -1 to train with all data
--loss_name str BCELoss loss function to use, options are BCELoss and FocalLoss
--csv_name str submission.csv name of csv file to save with test predictions
--model str models.xresnet18 can be a fastai model as the default or xresnet{18,34,50}ssa to use simple self-attention
--weights_file str stage-1 name of file to save the weights
--load_weights str provide the name of weights file (e.g., stage-1) to load before training
--max_processors int 8 number cpu threads to use for converting wav files to png
--force bool False if set to True the pngs will be recomputed for noisy and curated train datasets

Replicating my top scoring solution

Important! This code has not yet been tested. I ran all experiments on Kaggle kernels and refactored the code to create this repository. After the final results of the competition are available, late submissions will be allowed so I will then test the code to check if anything is missing.

My top scoring solution with a score of 0.742 on public LB and 0.75421 on private LB (final results pending...) is the average of the following 6 runs:

python run.py --model xresnet18ssa --base_dim 128 --SZ 256 --fold_number -1 \
              --n_epochs 80 --loss_name FocalLoss --weights_file model1 --csv_name submission1.csv
              
python run.py --model xresnet34ssa --base_dim 128 --SZ 256 --fold_number -1 \
              --n_epochs 60 --loss_name FocalLoss --weights_file model2 --csv_name submission2.csv
              
python run.py --model xresnet18ssa --base_dim 128 --SZ 256 --fold_number -1 \
              --n_epochs 80 --weights_file model3 --csv_name submission3.csv
              
python run.py --model xresnet34ssa --base_dim 128 --SZ 256 --fold_number -1 \
              --n_epochs 60 --weights_file model4 --csv_name submission4.csv

python run.py --model models.xresnet34 --base_dim 128 --SZ 256 --fold_number -1 \
              --n_epochs 90 --loss_name FocalLoss --weights_file model5 --csv_name submission5.csv
              
python run.py --model models.xresnet50 --base_dim 128 --SZ 256 --fold_number -1 \
              --n_epochs 65 --weights_file model6_0
              
python run.py --model models.xresnet50 --base_dim 128 --SZ 256 --fold_number -1 \
              --n_epochs 65 --load_weights model6_0 --weights_file model6 --csv_name submission6.csv             

The penultimate run, generating model6_0 weights is not used for the ensemble, is just to generate the weights that are used to the last identical run. If you are running locally, try a single run with more epochs, the 2x65 epochs is just to accommodate for the 9h run-time limit of Kaggle kernels.

Ablation study (in progress)*

  • Fixed parameters: --base_dim 128 --SZ 256 --fold_number -1 --n_epochs 80 --loss_name FocalLoss
Model private LB scores
xresnet18ssa [0.74211, 0.74695]
xresnet34ssa [0.74545, 0.74875]
xresnet50 [0.73966, 0.74062]
  • Fixed parameters: --base_dim 128 --SZ 256 --fold_number -1 --n_epochs 80 --loss_name BCELoss
Model private LB scores
xresnet18ssa [0.73824, 0.73956]
xresnet34ssa [0.74378, 0.74567]

Citing this repository

@misc{mnpinto2019audio,
  author = {Pinto, M. M.},
  title = {6th place solution for Freesound Audio Tagging 2019 Competition},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/mnpinto/audiotagging2019}}
}

audiotagging2019's People

Contributors

mnpinto avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

audiotagging2019's Issues

NameError: name 'ImageDataBunch' is not defined

Hi! I tried to install with conda like in the instructions, but I got this error: class ImageDataBunch(ImageDataBunch):
NameError: name 'ImageDataBunch' is not defined

Doing a little research I did a fresh install with torch==1.0.0 torchvision==0.2.1 and fastai from wheel.
I still get the same error...

Could you please tell me exactly what versions you have installed? Or maybe this error is from other causes?..
Thanks a bunch!

can't set attribute error

here is everything:

wb@i7:~/git/6th/audiotagging2019$ python run.py --model xresnet18ssa --base_dim 128 --SZ 256 --fold_number -1 --n_epochs 80 --loss_name FocalLoss --weights_file model1 --csv_name submission1.csv

Starting run using the following configuration:
path : data
working_path : .
base_dim : 128
SZ : 256
BS : 64
lr : 0.01
n_epochs : 80
epoch_size : 1000
f2cl : 1
fold_number : -1
loss_name : FocalLoss
csv_name : submission1.csv
model : xresnet18ssa
load_weights :
weights_file : model1
max_processors: 8
force : False

Computing mel spectrograms for the curated train dataset and saving as .png:
|| 100.00% [4970/4970 00:55<00:00]
Computing mel spectrograms for the noisy train dataset and saving as .png:
|| 100.00% [19815/19815 05:58<00:00]
Computing mel spectrograms for the test dataset:
|| 100.00% [3361/3361 01:04<00:00]

Loading train data:
Traceback (most recent call last):00]
File "run.py", line 250, in
load_weights=load_weights, force=args.force)
File "run.py", line 124, in main
.databunch(samplers=samplers, path=working_path, bs=BS)
File "/home/wb/anaconda3/lib/python3.6/site-packages/fastai/data_block.py", line 550, in databunch
num_workers=num_workers, dl_tfms=dl_tfms, device=device, collate_fn=collate_fn, no_check=no_check, **kwargs)
File "/home/wb/git/6th/audiotagging2019/utils.py", line 19, in create
zip(datasets, (bs,val_bs,val_bs,val_bs), samplers) if d is not None]
File "/home/wb/git/6th/audiotagging2019/utils.py", line 19, in
zip(datasets, (bs,val_bs,val_bs,val_bs), samplers) if d is not None]
File "/home/wb/git/6th/audiotagging2019/utils.py", line 42, in init
super().init(data_source)
File "/home/wb/git/6th/audiotagging2019/utils.py", line 36, in init
self.num_samples = num_samples
AttributeError: can't set attribute

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.