Directional CNNs

This repository accompanies the paper Musical Tempo and Key Estimation using Convolutional Neural Networks with Directional Filters in order to improve reproducibility of the reported results.

If you just want to estimate tempo or key values using models from the paper, please take a look at the tempo-cnn and key-cnn repos. They hosts pre-trained models.

Audio Files

Unfortunately, because of size limitations imposed by GitHub as well as copyright issues, this repository does not contain all audio samples or extracted features. But you can download those and extract them yourself.

Download links:

Should you use any of the datasets in your academic work, please cite the corresponding publications.

Annotations

All necessary ground truth annotations are in the annotations folder. For easy parsing they are formatted in a simple tab separated values (.tsv) format, with columns id \t bpm \t key \t genre \n. The class GroundTruth is capable of reading and interpreting these files.

Installation

In a clean Python 3.5/3.6 environment:

git clone https:/github.com/hendriks73/directional_cnns.git
cd directional_cnns
python setup.py install

Feature Extraction

To extract features, you can use the code in feature_extraction.py or the command line script mentioned below. Depending on how you define sample identifiers, you may need to make some manual adjustments. The created .joblib files are simple dictionaries, containing strings as keys and a spectrograms as values. Note that the extracted spectrograms for the key and the tempo task differ (CQT vs Mel).

After installation, you may run the extraction using the following command line script:

directional_cnn_extraction -a AUDIO_FILES_FOLDER [-g GROUND_TRUTH.tsv]

The ground truth file is optional. If given, only files that also occur in the ground truth are added to the created feature .joblib files.

Running

You can run the code either locally or on Google ML Engine.

Local

Running this locally only makes sense on a GPU and even then it will take very long.

To run the training/reporting locally, you can execute the script training.py or the command line script mentioned below with the following arguments (example for key):

--job-dir=./
--model-dir=./
--train-file=annotations/key_train.tsv --valid-file=annotations/key_valid.tsv
--test-files=annotations/giantsteps-key.tsv,annotations/gtzan_key.tsv,annotations/lmd_key_test.tsv
--feature-files=features/giantsteps_key.joblib,features/mtg_tempo_key.joblib,features/gtzan_key.joblib,features/lmd_key.joblib

After installation, you may run the training code using the following command line script:

directional_cnn_training [arguments]

Remote

To run the training/reporting remotely on Google ML Engine, you first need to sign up, upload all necessary feature- and annotation-files to Google storage and then adapt the provided scripts trainandpredict_key_ml_engine.sh and trainandpredict_tempo_ml_engine.sh accordingly.

License

This repository is licensed under CC BY 3.0. For attribution, please cite:

Hendrik Schreiber and Meinard Müller, Musical Tempo and Key Estimation using Convolutional Neural Networks with Directional Filters, In Proceedings of the Sound and Music Computing Conference (SMC), Málaga, Spain, May 2019.

hendriks73 / directional_cnns Goto Github PK