GithubHelp home page GithubHelp logo

dansuh17 / jdcnet-pytorch Goto Github PK

View Code? Open in Web Editor NEW
45.0 1.0 5.0 19.46 MB

pytorch implementation of JDCNet, singing voice detection and classification network

Python 100.00%
pytorch deep-learning singing-voice melody mir lstm bilstm music-information-retrieval

jdcnet-pytorch's Introduction

JDCNet-pytorch

This is a PyTorch re-implementation of Kum et al. - "Joint Detection and Classification of Singing Voice Melody Using Convolutional Recurrent Neural Networks" (2019). The proposed neural network model will be called JDCNet for convenience.

This is an attempt of implementing JDCNet as close as possible with the original paper. Any ambiguities in implementation details have been filled in by my own decisions, which account for any differences with the original author's implementation details.

Prerequisites

Major dependencies for this project are:

  • python >= 3.6
  • pytorch >= 1.2
  • librosa >= 0.7.0
  • pytorch-land == 0.1.6 (train only)

Any other required libraries are written in requirements.txt.

This project also uses a mini library called pytorch-land created by myself that implements a general Trainer for pytorch based models. It provides easy logging, native tensorboard support, and performs basic "train-validate-test" training sequence.

librosa is used for reading audio files.

JDCNet

JDCNet is a singing voice melody detection and classification network. It detects detection of whether there exists a noticeable singing voice in a certain frame, and, if exists, classifies the pitch of the sung note.

The pitch classification is done using a convolutional network with a bidirectional LSTM (BiLSTM) module attached at the end. Intermediate features for pitch classifier are utilized by the auxiliary detector network, also a BiLSTM module, to aid the determination of voice existence.

The input is a log-magnitude spectrogram chunk that consists of 31 frames and 513 frequency bins.

The model predicts whether or not the voice exists for each frame, giving a (31 x 2) tensor output, and classifies the pitch into one of 722 classes that represent 721 different frequencies evenly distributed (in log scale) from notes D3 (MIDI=38) to B5 (MIDI=83) inclusive, and an extra 'non-voice' class.

jdcnet_architecture

Data Preprocess

MedleyDB's Melody Subset dataset is used to train this model. Acquire the dataset, extract the contents, and run the preprocessing script to be ready for training.

./medleydb_preprocess.py --in_root <medleydb_root> --out_root <output_root> --metadata_path <path>/<to>/<metadata_file>.json

Train

You must provide a configuration file to train the network. Default configuration file with default parameters are provided as default_config.json. In order to start training, run the script train.py.

./train.py --config default_config.json

Singing voice melody extraction

You can generate a MIDI file containing extracted singing voice melody using the provided pretrained model.

./extract_melody.py --model example_model/jdcnet_model.pth --input_audio <your_audio>.wav

Generated Melody Audio Examples

Some audible examples have been posted in this post, and example MIDI files are in the 'melody_results' directory.

jdcnet-pytorch's People

Contributors

dansuh17 avatar dependabot[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

jdcnet-pytorch's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.