GithubHelp home page GithubHelp logo

hendriks73 / tempo-cnn Goto Github PK

View Code? Open in Web Editor NEW
83.0 2.0 12.0 154.74 MB

Framework for estimating temporal properties of music tracks.

License: GNU Affero General Public License v3.0

Python 100.00%
cnn neural network fcn tempo mir music meter audio

tempo-cnn's Introduction

Tempo-CNN

Tempo-CNN is a simple CNN-based framework for estimating temporal properties of music tracks featuring trained models from several publications [1] [2] [3] [4].

First and foremost, Tempo-CNN is a tempo estimator. To determine the global tempo of an audio file, simply run the script

tempo -i my_audio.wav

To create a local tempo "tempogram", run

tempogram my_audio.wav

For a complete list of options, run either script with the parameter --help.

For programmatic use via the Python API, please see here.

Installation

In a clean Python 3.6 or 3.7 environment, simply run:

pip install tempocnn

If you rather want to install from source, clone this repo and run setup.py install using Python 3.6 or 3.7:

git clone https://github.com/hendriks73/tempo-cnn.git
cd tempo-cnn
python setup.py install

Models and Formats

You may specify other models and output formats (MIREX, JAMS) via command line parameters.

E.g. to create JAMS as output format and the model originally used in the ISMIR 2018 paper [1], please run

tempo -m ismir2018 --jams -i my_audio.wav

For MIREX-style output, add the --mirex parameter.

DeepTemp Models

To use one of the DeepTemp models from [3] (see also repo directional_cnns), run

tempo -m deeptemp --jams -i my_audio.wav

or,

tempo -m deeptemp_k24 --jams -i my_audio.wav

if you want to use a higher capacity model (some k-values are supported). deepsquare and shallowtemp models may also be used.

Note that some models may be downloaded (and cached) at execution time.

Mazurka Models

To use DT-Maz models from [4], run

tempo -m mazurka -i my_audio.wav

This defaults to the model named dt_maz_v_fold0. You may choose another fold [0-4] or another split [v|m]. So to use fold 3 from the M-split, use

tempo -m dt_maz_m_fold3 -i my_audio.wav

Note that Mazurka models may be used to estimate a global tempo, but were actually trained to create tempograms for Chopin Mazurkas [4].

While it's cumbersome to list the split definitions for the Version folds, the Mazurka folds are easily defined:

  • fold0 was tested on Chopin_Op068No3 and validated on Chopin_Op017No4
  • fold1 was tested on Chopin_Op017No4 and validated on Chopin_Op024No2
  • fold2 was tested on Chopin_Op024No2 and validated on Chopin_Op030No2
  • fold3 was tested on Chopin_Op030No2 and validated on Chopin_Op063No3
  • fold4 was tested on Chopin_Op063No3 and validated on Chopin_Op068No3

The networks were trained on recordings of the three remaining Mazurkas. In essence this means, do not estimate the local tempo for Chopin_Op024No2 using dt_maz_m_fold0, because Chopin_Op024No2 was used in training.

Batch Processing

For batch processing, you may want to run tempo like this:

find /your_audio_dir/ -name '*.wav' -print0 | xargs -0 tempo -d /output_dir/ -i

This will recursively search for all .wav files in /your_audio_dir/, analyze then and write the results to individual files in /output_dir/. Because the model is only loaded once, this method of processing is much faster than individual program starts.

Interpolation

To increase accuracy for greater than integer-precision, you may want to enable quadratic interpolation. You can do so by setting the --interpolate flag. Obviously, this only makes sense for tracks with a very stable tempo:

tempo -m ismir2018 --interpolate -i my_audio.wav

Tempogram

Instead of estimating a global tempo, Tempo-CNN can also estimate local tempi in the form of a tempogram. This can be useful for identifying tempo drift.

To create such a tempogram, run

tempogram -p my_audio.wav

As output, tempogram will create a .png file. Additional options to select different models and output formats are available.

You may use the --csv option to export local tempo estimates in a parseable format and the --hop-length option to change temporal resolution. The parameters --sharpen and --norm-frame let you post-process the image.

Greek Folk

Tempo-CNN provides experimental support for temporal property estimation of Greek folk music [2]. The corresponding models are named fma2018 (for tempo) and fma2018-meter (for meter). To estimate the meter's numerator, run

meter -m fma2018-meter -i my_audio.wav

Programmatic Usage

After installation, you may use the package programmatically.

Example for global tempo estimation:

from tempocnn.classifier import TempoClassifier
from tempocnn.feature import read_features

model_name = 'cnn'
input_file = 'some_audio_file.mp3'

# initialize the model (may be re-used for multiple files)
classifier = TempoClassifier(model_name)

# read the file's features
features = read_features(input_file)

# estimate the global tempo
tempo = classifier.estimate_tempo(features, interpolate=False)
print(f"Estimated global tempo: {tempo}")

Example for local tempo estimation:

from tempocnn.classifier import TempoClassifier
from tempocnn.feature import read_features

model_name = 'cnn'
input_file = 'some_audio_file.mp3'

# initialize the model (may be re-used for multiple files)
classifier = TempoClassifier(model_name)

# read the file's features, specify hop_length for temporal resolution
features = read_features(input_file, frames=256, hop_length=32)

# estimate local tempi, this returns tempo classes, i.e., a distribution
local_tempo_classes = classifier.estimate(features)

# find argmax per frame and convert class index to BPM value
max_predictions = np.argmax(local_tempo_classes, axis=1)
local_tempi = classifier.to_bpm(max_predictions)
print(f"Estimated local tempo classes: {local_tempi}")

License

Source code and models can be licensed under the GNU AFFERO GENERAL PUBLIC LICENSE v3. For details, please see the LICENSE file.

Citation

If you use Tempo-CNN in your work, please consider citing it.

Original publication:

@inproceedings{SchreiberM18_TempoCNN_ISMIR,
   Title = {A Single-Step Approach to Musical Tempo Estimation Using a Convolutional Neural Network},
   Author = {Schreiber, Hendrik and M{\"u}ller Meinard},
   Booktitle = {Proceedings of the 19th International Society for Music Information Retrieval Conference ({ISMIR})},
   Pages = {98--105},
   Month = {9},
   Year = {2018},
   Address = {Paris, France},
   doi = {10.5281/zenodo.1492353},
   url = {https://doi.org/10.5281/zenodo.1492353}
}

ShallowTemp, DeepTemp, and DeepSquare models:

@inproceedings{SchreiberM19_CNNKeyTempo_SMC,
   Title = {Musical Tempo and Key Estimation using Convolutional Neural Networks with Directional Filters},
   Author = {Hendrik Schreiber and Meinard M{\"u}ller},
   Booktitle = {Proceedings of the Sound and Music Computing Conference ({SMC})},
   Pages = {47--54},
   Year = {2019},
   Address = {M{\'a}laga, Spain},
   doi = {10.5281/zenodo.3249250},
   url = {https://doi.org/10.5281/zenodo.3249250}
}

Mazurka models:

@inproceedings{SchreiberZM20_LocalTempo_ISMIR,
   Title = {Modeling and Estimating Local Tempo: A Case Study on Chopin’s Mazurkas},
   Author = {Hendrik Schreiber and Frank Zalkow and Meinard M{\"u}ller},
   Booktitle = {Proceedings of the 21th International Society for Music Information Retrieval Conference ({ISMIR})},
   Pages = {773--779},
   Year = {2020},
   Address = {Montreal, QC, Canada},
   doi = {10.5281/zenodo.4245546},
   url = {https://doi.org/10.5281/zenodo.4245546}
}

References

[1](1, 2) Hendrik Schreiber, Meinard Müller, A Single-Step Approach to Musical Tempo Estimation Using a Convolutional Neural Network, Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), Paris, France, Sept. 2018.
[2](1, 2) Hendrik Schreiber, Technical Report: Tempo and Meter Estimation for Greek Folk Music Using Convolutional Neural Networks and Transfer Learning, 8th International Workshop on Folk Music Analysis (FMA), Thessaloniki, Greece, June 2018.
[3](1, 2) Hendrik Schreiber, Meinard Müller, Musical Tempo and Key Estimation using Convolutional Neural Networks with Directional Filters, Proceedings of the Sound and Music Computing Conference (SMC), Málaga, Spain, 2019.
[4](1, 2, 3) Hendrik Schreiber, Frank Zalkow, Meinard Müller, Modeling and Estimating Local Tempo: A Case Study on Chopin’s Mazurkas, Proceedings of the 21st International Society for Music Information Retrieval Conference (ISMIR), Montréal, QC, Canada, Oct. 2020.

tempo-cnn's People

Contributors

hendriks73 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

tempo-cnn's Issues

Real time tempo detection

Hi, can I apply your library in realtime tempo detection?
So signal will be recorded in frames and send in model somthing like in Local tempo estimation...
Do you have any recommendation, frame length...and is this possible at all or does the complete audio file have to be loaded?
This is my code which does not work:
`from tempocnn.classifier import TempoClassifier
from tempocnn.feature import read_features
import pyaudio
import numpy as np
from librosa import util

model_name = 'cnn'
classifier = TempoClassifier(model_name)

chunk = 2048
sample_format = pyaudio.paInt16
channels = 1
fs = 11025 # frames per channel
seconds = 50
p=pyaudio.PyAudio()
print("Recording ...")
stream = p.open(format = sample_format,
channels = channels,
rate = fs,
frames_per_buffer = chunk,
input = True)

for i in range(0, int(fs/chunk * seconds)):
data = stream.read(chunk)
data_pcm = util.buf_to_float(data, dtype=np.float32)
features = read_features(data_pcm, frames=256, hop_length=32)
# estimate local tempi, this returns tempo classes, i.e., a distribution
local_tempo_classes = classifier.estimate(features)
# find argmax per frame and convert class index to BPM value
max_predictions = np.argmax(local_tempo_classes, axis=1)
local_tempi = classifier.to_bpm(max_predictions)
print(f"Estimated local tempo classes: {local_tempi}")

stream.stop_stream()
stream.close()
p.terminate()
print("... Ending Recording")
`

Handling Octave Errors

I am using tempo-cnn and in some of the cases, the global tempo estimated comes out to be twice the original ground truth value also known as Octave Error.
Can you please help me in removing these errors?

Python 3.8+ (tensorflow 2) version

Hi!
Currently this package requires tensorflow 1.15, which only has wheels on pypi for python 3.7. What are our options for running on more modern versions of python? I don't do much deep learning but my understanding is that tensorflow 2 has a different API, is that right?

installation error with pip

pip install tempocnn
Collecting tempocnn
Using cached tempocnn-0.0.6-py3-none-any.whl (70.0 MB)
Collecting h5py<3.0.0,>=2.7.0
Using cached h5py-2.10.0.tar.gz (301 kB)
Preparing metadata (setup.py) ... done
Collecting librosa>=0.6.2
Using cached librosa-0.10.0.post2-py3-none-any.whl (253 kB)
Requirement already satisfied: setuptools>=41.0.0 in ./env/lib/python3.10/site-packages (from tempocnn) (59.6.0)
Collecting tempocnn
Using cached tempocnn-0.0.5-py3-none-any.whl (70.0 MB)
ERROR: Cannot install tempocnn==0.0.5 and tempocnn==0.0.6 because these package versions have conflicting dependencies.

The conflict is caused by:
tempocnn 0.0.6 depends on tensorflow==1.15.4
tempocnn 0.0.5 depends on tensorflow==1.15.4

To fix this you could try to:

  1. loosen the range of package versions you've specified
  2. remove package versions to allow pip attempt to solve the dependency conflict

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.