- 105,057 midi songs, with over a million instrument stems (subset of lakh midi)
- 15,654 instrument presets (synths & sampled instruments)
- a classification / rendering process to combine the two into over 3500 hours of distinct audio
it can be used to train pretty much any music task
you have the 'source code' of the audio, making it simple to augment / modify the data, all in a format which is dramatically smaller than even the most compressed audio
each midi file contains:
- local tempo
- time signature
- key signature
- score ('piano roll') for each instrument
- 128 instrument classes (General Midi)
- ~8 instruments per song on average
- many even have lyrics
- todo - render vocals with SVS using lyrics / melody
this makes it well-suited to tasks like:
- tempo / key classification
- beat / downbeat prediction
- stem separation
- multi-instrument transcription
- generative composition ('write a bass line for this piano melody')
lakh midi dataset - where all the midi is from (s/o colin raffel)
pretty-midi- how the midi files are handled & rendered (also by colin raffel)
slakh2100 - where I got the idea