GithubHelp home page GithubHelp logo

kristijanbartol / deep-music-tagger Goto Github PK

View Code? Open in Web Editor NEW
67.0 3.0 7.0 52.67 MB

Music genre classification model using CRNN

License: MIT License

Python 100.00%
music-information-retrieval fma mel-spectrograms convolutional-networks deep-learning recurrent-networks music-classification librosa keras deeplearning deep-neural-networks music music-analysis audio-analysis

deep-music-tagger's Introduction

Deep Music Classifier

Note

Still I haven't run any complete training procedure, but after I do and verify it, the plan is to use this specific classifier for music generation based on genre, which, to the best of my knowledge, still hasn't been done.

If you need assistance running the project or have a question, please email me on [email protected]

About

The ideal goal of this project is to be able to say "This part of the song has the elements of jazz, progressive rock and a bit of grunge.". This could be possible to achieve defining the problem as multi-output classification.

Deep model is based on [Dec 2016.] Convolutional Recurrent Neural Networks for Music Classification (Keunwoo Choi, George Fazekas, Mark Sandler, Kyunghyun Cho) [1], i.e. using convolutional recurrent neural network deep model for multi-output classification task (tagging each music piece using a subset of labels).

Prerequisite

To be able to run all parts of this project, you will need the following additional Python packages (recommended is Python 3.6):

  • keras - build and train the high-level model
  • librosa - extract mel-spectrograms
  • pandas - analyze FMA metadata
  • numpy - efficiently work with linear algebra operations
  • tensorflow (GPU recommended) - modify keras backend
  • matplotlib - plot various graphs and use it extract librosa spectrograms

Input features

Mel-spectrograms are extracted from .mp3s and used as model inputs. An example of such a spectrogram is: Mel-spectrogram example

However, when generating images for the model, image is generated a bit differently - spectrogram values matrix is dumped into an image in grayscale. Information is preserved this way and there is only one input layer for convolution instead of three. An example of such an image is: Grayscale spectrogram example

Other spectrograms could also be used as described and compared in detail in [5]. In this work, except mel-spectrograms, raw audio input will also be tested [6].

Data

Using FMA dataset (A Dataset For Music Analysis) [2]. It is a collection of freely available MP3s (under Creative Commons license) most convenient for research projects and (currently) only publicly available music dataset of a kind. Top 16 genres distribution is shown in the following histogram: Genres histogram

Usage

  1. take a look at and download FMA dataset metadata (342 MiB). For more details, check this repo.

  2. Then download small or medium; try with smaller versions first to set things up and then switch to large. I won't use full version as input images then have various sizes and it's anyways to large for my computing resources plus I believe there is more than enough information in 30s trimmed tracks.

  3. Extract mel-spectrograms from mp3s running mel-spec.py as main module.

  4. Generate relevant metadata running metadata.py as main module.

  5. Run train.py to build, compile and train a keras model (CRNN architecture mentioned above).

Project structure:

  • data/
    • fma_{size}/
      • 000/
      • 001/
    • fma_metadata/
      • genres.csv
      • tracks.csv
  • in/
    • mel-specs/
      • 000/
      • 001/
    • metadata/
      • test.csv
      • train.csv
      • valid.csv
  • out/
    • graphs/
    • logs/
  • src/
    • main.py
    • mel-spec.py
    • metadata.py
    • model.py
    • utility.py

Results

I still didn't run the whole training process...

CrowdAI competition (music genre classification - 16 classes)

Source code for this project also contains separate folder for CrowdAI competition. Main focus of this project in the next 60 days will be gaining better position on the leaderboard.

Relevant literature

[1] CRNN for Music Classification

[2] FMA: A Dataset For Music Analysis

[3] Music Information Retrival (origin of "MIR", Downie)

[4] A Tutorial on Deep Learning for Music Information Retrieval

[5] Comparison on Audio Signal Preprocessing Methods for Deep Neural Networks on Music Tagging

[6] End-to-end learning for music audio tagging at scale (1D convolution)

For broader references on music information retrieval, check https://github.com/ybayle/awesome-deep-learning-music.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.