Deep Bass

Automatic content driven cross fading between subsequent songs using a Wavenet autoencoder (NSynth).

This currently works on two audio files at a time. The ending of the first song and the beginning of the second song is isolated.

Trim

For simple cross fading, the audio is multiplied by a ramp function and summed to generate the mixed audio.

Ramped

Crossfaded

Play Audio

For the cross fading with NSynth, lower dimensional embeddings are created. Cross fading is done on the embeddings instead of the raw audio, and this modified embedding is fed through the decoder to generate the resulting audio.

Encodings

NSynth cross fade

Play Audio

See this presentation for further details.

Table of contents
Setup
Usage
Background

Setup

Clone repository and update python path

repo_name=DeepBass 
username=SiftingSands
git clone https://github.com/$username/$repo_name
cd $repo_name
echo "export $repo_name=${PWD}" >> ~/.bash_profile
echo "export PYTHONPATH=$repo_name/src:${PYTHONPATH}" >> ~/.bash_profile
source ~/.bash_profile

Create virtual environment (commands below are for Anaconda, otherwise follow https://docs.python.org/3/tutorial/venv.html). Install dependent packages from requirements.txt. "timbral_models" needed to be installed from source.

conda create --name DeepBass
pip install -r /<path>/DeepBass/build/requirements.txt
git clone https://github.com/AudioCommons/timbral_models.git
cd timbral_models
pip install .

Required Packages

'numpy'
'librosa'
'streamlit'
'matplotlib'
'tensorflow-gpu'
'timbral_models'
'soundfile'
'scipy'
'sklearn'
'essentia'
'joblib'

Usage

Work flow is typically as follows:

Setup the config file with the data inputs (songs) and cross fading parameters.
Run the cross fading method.
Evaluate the output audio.
An empirical 'roughness' measure is also calculated and output to a text file. See Section 1.5 of timbral_models documentation for details on roughness.
Alternatively, a similar measure called Sethares dissonance can be calculated over the peak frequencies over time. Example Python script.

Example config file

[DEFAULT]
samplingrate = 16000

[NSynth XFade Settings]
Style = LinearFade
time = 4

[Simple XFade Settings]
Style = Linear
time = 4

[Preprocess]
SR window duration = 30

[IO]
firstsong = Song1.mp3
secondsong = Song2.mp3
load directory = /home/ubuntu/DeepBass/data/raw/Exp6
save directory = /home/ubuntu/DeepBass/data/processed/Exp6_Retrained
save name = Exp6
model weights = /home/ubuntu/nsynth_train/model.ckpt-320000

[Plot]
Flag = True

Run Simple Cross Fading

Create a 'config.ini' either manually (place in /configs/) or by editing and running '/configs/CreateConfig.py'

Loads the first and second songs per the 'load directory', 'firstsong', and 'secondsong' in the config.ini
Detects if the ending and beginning has silence within a 'SR window duration' in seconds
Trims the audio to 'time' length (seconds) snippets under 'Simple XFade Settings'
Applys a linear crossfade between the beginning and ending snippets (different crossfade ramping functions are available)
Saves the crossfaded audio file per 'save name' and 'save directory'

cd ~/DeepBass/src
python main_simple.py -config_fname=<your-config-file-name>

Run NSynth Cross Fading

Create a 'config.ini' either manually (place in /configs/) or by editing and running '/configs/CreateConfig.py'

Loads the first and second songs per the 'load directory', 'firstsong', and 'secondsong' in the config.ini
Detects if the ending and beginning has silence within a 'SR window duration' in seconds
Trims the audio to 'time' length (seconds) snippets under 'NSynth XFade Settings'
Load the weights for the neural network from 'model weights'
Calculate encoding from the audio snippets
Combine both encodings
Synthesize mixed audio from encoding
Saves the crossfaded audio file per 'save name' and 'save directory'

cd ~/DeepBass/src
python main_NSynth.py -config_fname=<your-config-file-name>

Train the model from a checkpoint

Create a folder with all of the audio examples that you want to train on
Download previous checkpoint for the model (for example http://download.magenta.tensorflow.org/models/nsynth/wavenet-ckpt.tar)
Trim the beginning and endings of the songs and convert the data to the 'tfrecords' format
Run the training script with the path to the training data and model checkpoint. Currently no multi-GPU capability, see magenta/magenta#625

cd ~/DeepBass/src
python DataPrep.py -load_dir=<path-to-audio-files> -time=4 -save_dir=<path-typically-in-/DeepBass/Data/preprocessed> -savename=<name-for-tfrecords-file> -n_cpu=<number-of-cpu-threads>
python train.py --train_path=<path-to-tfrecords> --total_batch_size=6 --logdir=<path-to-checkpoint>

Model Background

The NSynth model from Google's Magenta team was used to create the audio mixes (https://magenta.tensorflow.org/nsynth). Starting from their published checkpoint, the model was further trained on 4 second snippets from the beginning and endings of over 500 songs in the EDM genre. Therefore, the 16 kHz downsampling and 8-bit Mu encoding is still present. Thanks to the Magenta team for making their work open source.

cattigers / deepbass Goto Github PK

deepbass's Introduction

Deep Bass

Table of contents

Setup

Required Packages

Usage

Example config file

Run Simple Cross Fading

Run NSynth Cross Fading

Train the model from a checkpoint

Model Background

deepbass's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs