GithubHelp home page GithubHelp logo

dmitryulyanov / neural-style-audio-torch Goto Github PK

View Code? Open in Web Editor NEW
142.0 13.0 24.0 1.42 MB

Torch implementation for audio neural style.

Python 11.60% Lua 88.40%
audio torch style-transfer neural-style

neural-style-audio-torch's Introduction

Audio texture synthesis (and a bit of stylization)

This is an extension of texture synthesis and style transfer method of Leon Gatys et al. based on Justin Johnson's code for neural style transfer.

To listen to examples go to the blog post. Almost identical Lasagne implementation by Vadim Lebedev can be found here. Also check out TensorFlow implementation.

Examples

As there is no way to embed an audio player with github markdown please follow this link for the examples of texture synthesis and style transfer.

Prerequisites

wget https://www.dropbox.com/s/xpyoehayuhxvibq/net.t7?dl=1 -O data/net.t7
wget https://www.dropbox.com/s/dwsq33r5bsgy9cd/mean.t7?dl=1 -O data/mean.t7

Usage

1. First convert raw audio to spectrogram
python get_spectrogram.py --in_audio <path to audio file> --out_npy <where to save spectrogram>

Additional arguments:

  • -offset and -duration control boundaries of a segment which will be converted to spectrogram. [By default first 10 seconds are used].
  • -sr: sample rate (samples), [default = 44100].
  • -n_fft: FFT window size (samples) [default = 1024].
2. Run texture synthesis or style transfer

This is the cmd used for all the texture synthesis examples:

th neural_style_audio.lua -content <path to content.npy> -style <path to style.npy> -content_layers <indices of content layers> -style_layers <indices of style layers> -style_weight <number> -content_weight <number>

Parameters:

  • -content, -style are paths to .npy spectrogram files generated previously.
  • -content_layers, -style_layers: comma separated layer indices (the provided net has about 50 layers and of encoder-decoder type).
  • -style_weight, -content_weight: set content_weight to zero for texture synthesis.
  • -lowres: the higher receptive field the larger textures network captures. The easiest way to increase a receptive field is to drop every other column from an input spectrogram (kind of downscaling). Use this flag to process spectrogram both in original and low resolution at the same time.
  • -how_div, -normalize_gradients, -loss change some details of loss calculation.
  • -save indicates folder to save intermediate results.
3. Convert spectrogram back to audio file
python invert_spectrogram.py --spectrogram_t7 <path to synthesized spectrogram> --out_audio keyboard2_texture.wav

Parameters:

Command-line used for texture examples

python get_spectrogram.py --out_npy data/inputs/keyboard2.npy --in_audio data/inputs/keyboard2.mp3
th neural_style_audio.lua -style data/inputs/keyboard2.npy -content_layers '' -style_layers 1,5,10,15,20,25 -style_weight 10000000 -optimizer lbfgs -learning_rate 1e-1 -num_iterations 5000 -lowres
python invert_spectrogram.py --spectrogram_t7 data/out/out.png.t7 --out_audio data/outputs/keyboard2_texture.wav

Command-line used for style transfer

We have also implemented minimalistic script identical to TensorFlow and Lasagne scripts. Here is example how to use it:

python get_spectrogram.py --out_npy data/inputs/usa.npy --in_audio data/inputs/usa.mp3 --n_fft 2048 --sr 22050
python get_spectrogram.py --out_npy data/inputs/imperial.npy --in_audio data/inputs/imperial.mp3 --n_fft 2048 --sr 22050
th neural_style_audio_random.lua -alpha 1e-2
python invert_spectrogram.py --spectrogram_t7 data/out/out.t7 --out_audio out.wav --n_iter 500 --n_fft 2048 --sr 22050

neural-style-audio-torch's People

Contributors

dmitryulyanov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

neural-style-audio-torch's Issues

Porting to PyTorch

I tried porting the code to PyTorch, specifically the Tensorflow version. The latter works perfectly.
But with PyTorch when I use optim.LBFGS, I run into exploding gradients/no updates on the target.
The error is in the update:
151
152 #update scale of initial Hessian approximation
--> 153 H_diag = ys / y.dot(y) # (y*y)
154
155 # compute the approximate (L-BFGS) inverse Hessian

ZeroDivisionError: float division by zero

I am running this on CPU only for the time being - macOS 10.13.1 64 bit.
Any insights on why this may be happening?

"unknown object" in function 'readObject' when loading the means data file

When attempting to use the texture synthesis example, I get the following error (line 424 is the code line that is loading the parameter means file):

/Users/mz2/torch/install/bin/luajit: /Users/mz2/torch/install/share/lua/5.1/torch/File.lua:375: unknown object
stack traceback:
	[C]: in function 'error'
	/Users/mz2/torch/install/share/lua/5.1/torch/File.lua:375: in function 'readObject'
	/Users/mz2/torch/install/share/lua/5.1/torch/File.lua:409: in function 'load'
	neural_style_audio.lua:424: in function 'load_data'
	neural_style_audio.lua:81: in function 'main'
	neural_style_audio.lua:435: in main chunk
	[C]: in function 'dofile'
	.../mz2/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
	[C]: at 0x010d4081c0

I'm executing this on macOS 10.11.6 โ€“ is the network perhaps serialised in some form not compatible with executing on macOS (64bit)? Any workarounds in mind?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.