riffusion / riffusion-app Goto Github PK

View Code? Open in Web Editor NEW

2.6K 37.0 179.0 31.29 MB

Stable diffusion for real-time music generation (web app)

Home Page: http://riffusion.com

License: MIT License

JavaScript 3.31% CSS 0.18% TypeScript 96.51%

ai audio diffusion music nextjs stable-diffusion threejs

riffusion-app's Introduction

🎸 Riffusion

Riffusion is a library for real-time music and audio generation with stable diffusion.

Read about it at https://www.riffusion.com/about and try it at https://www.riffusion.com/.

This is the core repository for riffusion image and audio processing code.

Diffusion pipeline that performs prompt interpolation combined with image conditioning
Conversions between spectrogram images and audio clips
Command-line interface for common tasks
Interactive app using streamlit
Flask server to provide model inference via API
Various third party integrations

Related repositories:

Web app: https://github.com/riffusion/riffusion-app
Model checkpoint: https://huggingface.co/riffusion/riffusion-model-v1

Citation

If you build on this work, please cite it as follows:

@article{Forsgren_Martiros_2022,
  author = {Forsgren, Seth* and Martiros, Hayk*},
  title = {{Riffusion - Stable diffusion for real-time music generation}},
  url = {https://riffusion.com/about},
  year = {2022}
}

Install

Tested in CI with Python 3.9 and 3.10.

It's highly recommended to set up a virtual Python environment with conda or virtualenv:

conda create --name riffusion python=3.9
conda activate riffusion

Install Python dependencies:

python -m pip install -r requirements.txt

In order to use audio formats other than WAV, ffmpeg is required.

sudo apt-get install ffmpeg          # linux
brew install ffmpeg                  # mac
conda install -c conda-forge ffmpeg  # conda

If torchaudio has no backend, you may need to install libsndfile. See this issue.

If you have an issue, try upgrading diffusers. Tested with 0.9 - 0.11.

Guides:

Simple Install Guide for Windows

Backends

CPU

cpu is supported but is quite slow.

CUDA

cuda is the recommended and most performant backend.

To use with CUDA, make sure you have torch and torchaudio installed with CUDA support. See the install guide or stable wheels.

To generate audio in real-time, you need a GPU that can run stable diffusion with approximately 50 steps in under five seconds, such as a 3090 or A10G.

Test availability with:

import torch
torch.cuda.is_available()

MPS

The mps backend on Apple Silicon is supported for inference but some operations fall back to CPU, particularly for audio processing. You may need to set PYTORCH_ENABLE_MPS_FALLBACK=1.

In addition, this backend is not deterministic.

Test availability with:

import torch
torch.backends.mps.is_available()

Command-line interface

Riffusion comes with a command line interface for performing common tasks.

See available commands:

python -m riffusion.cli -h

Get help for a specific command:

python -m riffusion.cli image-to-audio -h

Execute:

python -m riffusion.cli image-to-audio --image spectrogram_image.png --audio clip.wav

Riffusion Playground

Riffusion contains a streamlit app for interactive use and exploration.

Run with:

python -m riffusion.streamlit.playground

And access at http://127.0.0.1:8501/

Run the model server

Riffusion can be run as a flask server that provides inference via API. This server enables the web app to run locally.

Run with:

python -m riffusion.server --host 127.0.0.1 --port 3013

You can specify --checkpoint with your own directory or huggingface ID in diffusers format.

Use the --device argument to specify the torch device to use.

The model endpoint is now available at http://127.0.0.1:3013/run_inference via POST request.

Example input (see InferenceInput for the API):

{
  "alpha": 0.75,
  "num_inference_steps": 50,
  "seed_image_id": "og_beat",

  "start": {
    "prompt": "church bells on sunday",
    "seed": 42,
    "denoising": 0.75,
    "guidance": 7.0
  },

  "end": {
    "prompt": "jazz with piano",
    "seed": 123,
    "denoising": 0.75,
    "guidance": 7.0
  }
}

Example output (see InferenceOutput for the API):

{
  "image": "< base64 encoded JPEG image >",
  "audio": "< base64 encoded MP3 clip >"
}

Tests

Tests live in the test/ directory and are implemented with unittest.

To run all tests:

python -m unittest test/*_test.py

To run a single test:

python -m unittest test.audio_to_image_test

To preserve temporary outputs for debugging, set RIFFUSION_TEST_DEBUG:

RIFFUSION_TEST_DEBUG=1 python -m unittest test.audio_to_image_test

To run a single test case within a test:

python -m unittest test.audio_to_image_test -k AudioToImageTest.test_stereo

To run tests using a specific torch device, set RIFFUSION_TEST_DEVICE. Tests should pass with cpu, cuda, and mps backends.

Development Guide

Install additional packages for dev with python -m pip install -r requirements_dev.txt.

Linter: ruff
Formatter: black
Type checker: mypy

These are configured in pyproject.toml.

The results of mypy ., black ., and ruff . must be clean to accept a PR.

CI is run through GitHub Actions from .github/workflows/ci.yml.

Contributions are welcome through pull requests.

riffusion-app's People

Contributors

Stargazers

Watchers

Forkers

tuhins kokizzu m00dy saifrahmed sintef 1r053 system1system2 tylermorganwall codeaudit triptych doriandarko jerryrelmore ak391 jnhooper enfield kirillzubovsky nealatadaptavist hypertexthero fluential echelon teashawn hirajanwin srikalyan davidbenhaim eylor cookt jacknion matt-fff tfius hbcbh1999 coodoo audioai skratchdot shaun95 marcus-arcadius yuan-manx mozey paulmars cualquiercosa327 julesdeckers mbrukman jaedukseo botkop cxz arryboom warpedrhubarb neuroidss kandy22 baekms cprakashagr james4ever0 gabrielvidal1 bupresa bradparks zxb8686 rockystevejobs automationkit suryatmodulus crazier9527 hartl3y94 b01and viningr hbqdev mattjwarren aktiveflavor daanelson msarmadi remi9martin assassindesign towerdk2 bryanw6d eltociear zmzlois maxjonas2 sl33pyc01e gur22-09 shreyas-kulkarni mlaugharn a-leut am-official salimz04 0xjinbe nuooos lvzhiqiang himanshumoliya iarrationality cruelpleasure stjordanis wiwomu hapliniste fastflair torchesburn valwebd ddaying wudangt setmaster avnish-wynk konstantin-pv ayo-faks lcsouzamenezes

riffusion-app's Issues

main fails to build out of the box

hi fresh clone just now. 58097b7, main branch. windows 10 (unfortunately) and no problems running npm install etc., but when first connecting to localhost:3000 the compilation fails with:

error - (api)\pages\api\server.js (11:12) @ handler
error - ReferenceError: AbortSignal is not defined
    at handler (webpack-internal:///(api)/./pages/api/server.js:14:17)
    at Object.apiResolver (C:\Users\x\Dev\riffusion\riffusion-app\node_modules\next\dist\server\api-utils\node.js:367:15)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (internal/process/task_queues.js:95:5)
    at async DevServer.runApi (C:\Users\x\Dev\riffusion\riffusion-app\node_modules\next\dist\server\next-server.js:474:9)
    at async Object.fn (C:\Users\x\Dev\riffusion\riffusion-app\node_modules\next\dist\server\next-server.js:736:37)
    at async Router.execute (C:\Users\x\Dev\riffusion\riffusion-app\node_modules\next\dist\server\router.js:252:36)
    at async DevServer.run (C:\Users\x\Dev\riffusion\riffusion-app\node_modules\next\dist\server\base-server.js:384:29)
    at async DevServer.run (C:\Users\x\Dev\riffusion\riffusion-app\node_modules\next\dist\server\dev\next-dev-server.js:732:20)
    at async DevServer.handleRequest (C:\Users\x\Dev\riffusion\riffusion-app\node_modules\next\dist\server\base-server.js:322:20) {
  page: '/api/server'
}
   9 |     headers: headers,
  10 |     body: req.body,
> 11 |     signal: AbortSignal.timeout(15000),
     |            ^
  12 |   });
  13 |
  14 |   const data = await response.json();
event - compiled client and server successfully in 324 ms (2057 modules)

AbortSignal is not defined

I'm getting a 500 error when I try to run any prompt. The console error log shows that AbortSignal is not defined:

error - (api)\pages\api\server.js (11:12) @ handler
error - ReferenceError: AbortSignal is not defined
at handler (webpack-internal:///(api)/./pages/api/server.js:14:17)
at Object.apiResolver (C:\Users\andrea\Desktop\Riffusion\web-app\node_modules\next\dist\server\api-utils\node.js:367:15)
at runMicrotasks ()
at processTicksAndRejections (internal/process/task_queues.js:95:5)
at async DevServer.runApi (C:\Users\andrea\Desktop\Riffusion\web-app\node_modules\next\dist\server\next-server.js:474:9)
at async Object.fn (C:\Users\andrea\Desktop\Riffusion\web-app\node_modules\next\dist\server\next-server.js:736:37)
at async Router.execute (C:\Users\andrea\Desktop\Riffusion\web-app\node_modules\next\dist\server\router.js:252:36)
at async DevServer.run (C:\Users\andrea\Desktop\Riffusion\web-app\node_modules\next\dist\server\base-server.js:384:29)
at async DevServer.run (C:\Users\andrea\Desktop\Riffusion\web-app\node_modules\next\dist\server\dev\next-dev-server.js:732:20)
at async DevServer.handleRequest (C:\Users\andrea\Desktop\Riffusion\web-app\node_modules\next\dist\server\base-server.js:322:20) {
page: '/api/server'
}
9 | headers: headers,
10 | body: req.body,

11 | signal: AbortSignal.timeout(15000),
| ^
12 | });
13 |
14 | const data = await response.json();

Default sound list

Instead of displaying only 3 example prompts, re-design the website to list many of them (e.g. card format), including the current default ones and if possible also the most common user ones

Generation doesn't work

I couldn't generate anything on the website with my prompts (Migos feat Gucci Mane drill, American platinum certified trap). It worked with a standard Eminem aggressive rap prompt, but it doesn't work with other standard prompts as well (e.g. post-teen pop talent show winner). I've waited for 30 minutes and got no result.

riffusion website uses 100% cpu / gpu

Hi,
I would ask why you use 100% GPU / CPU without asking the users for permission.
I like the idea of the app but this is an absolut no-go....

Autoplay

No need to require user to manually click play

False "server falling behind" messages

There seems to be a bug where after it's running for a long time, it continuously starts complaining about the server falling behind even though it's actively receiving new data every 5 seconds. The image doesn't update, either.

Feature Request: button to download the tile the user just heard

First, this is really amazing. Mind: blown.

Quick thought: It would be nice UX if there were a fast way to download the tile for the thing you "just" heard. In other words, whatever corresponds to what was playing ~2 seconds ago. I know this is lurking in the "Share" modal. And I can see you've prioritized super clean UI. But it would be cool if just a bit more was exposed at the top level. Otherwise there is a whole lot of "Wow, that was awesome... but how do I get it back?"

Also, please name the downloaded file with the name of the prompt, instead of just "download.mp3" :) It's nice to have that record-keeping done for you.

Error: connect ECONNREFUSED ::1:3013

When I created a .env file with:
RIFFUSION_FLASK_URL=http://localhost:3013/run_inference/
I've got a error ECONNREFUSED ::1:3013,

which was solvable by changing that line to:
RIFFUSION_FLASK_URL=http://127.0.0.1:3013/run_inference/

Does this app work in interpolation mood or simple (text to audio) mood?

Hi @hmartiro.

Could you please confirm if this app works in the interpolaion mood? or in simple (text to audio) mood?

The simple (text to audio) mood generates 5.12 (for width=512) seconds music in one step. So how do you keep playing long duration music without interruption? by looping over many steps and changing the seed number for the next step (keeping same prompt)? Or some other mechanism?

Occasional tab freezes

Occasionally, the app causes the whole tab to freeze for a number of seconds (possibly only after it's been running for a few hours?).

Public URL: invalid project directory provided

I can run the app fine locally (after some messing around!), but not on a public URL. If I specify a URL, I get the following error:

_[riffusion-app]$ npm run dev --hostname MY.IP.ADDRESS.HERE

[email protected] dev
next dev MY.IP.ADDRESS.HERE

error - Invalid project directory provided, no such directory: /path/to/riffusion-app/MY.IP.ADDRESS.HERE_

Rather weird - it wants a directory with the name of my IP address? Containing what?

Research paper?

Hi @hmartiro,

This looks like a hell of work! Why don't you publish a research paper as well in addition to your already published website docs? Would love to read about your depth analysis!

Regards
Rahul Bhalley

Idle detection is too heavy

There are constant popup boxes about whether you're idle or still listening, it's difficult to have a seamless experience by leaving it on the background and enjoying the music.

While I understand the need to save on server costs, there could be alternate solutions.

For instance, they could start popping up much less often if the user has clicked "yes" to the last few ones.

Baseten instructions?

Hey, sorry but I'm failing to understand how to setup baseten for use with riffusion. Is it possible to add some steps?

[Colab] ImportError: cannot import name 'CompVisVDenoiser' from 'k_diffusion.external'

Getting this error when trying to run the notebook locally.
any suggestions on how to fix this guys.

the error is :

ImportError: cannot import name 'CompVisVDenoiser' from 'k_diffusion.external'

Error on npm run dev

Project/riffusion-app/node_modules/next/dist/cli/next-dev.js:315
showAll: args["--show-all"] ?? false,
^

SyntaxError: Unexpected token '?'
at wrapSafe (internal/modules/cjs/loader.js:915:16)
at Module._compile (internal/modules/cjs/loader.js:963:27)
at Object.Module._extensions..js (internal/modules/cjs/loader.js:1027:10)
at Module.load (internal/modules/cjs/loader.js:863:32)
at Function.Module._load (internal/modules/cjs/loader.js:708:14)
at Module.require (internal/modules/cjs/loader.js:887:19)
at require (internal/modules/cjs/helpers.js:74:18)
at Object.dev (/home/ray/Project/riffusion-app/node_modules/next/dist/lib/commands.js:10:30)
at Object. (/home/ray/Project/riffusion-app/node_modules/next/dist/bin/next:141:28)
at Module._compile (internal/modules/cjs/loader.js:999:30)

when trying to run in Colab notebook the "upload your own files" there is an error

the error is "NameError: name 'spectrogram_from_waveform' is not defined"

I changed it to float16 and I could create music files even with 6GB of VRAM.

Thanks for a very fantastic Project.
I tried it right away, but my GPU only has 6GB of VRAM, so I couldn't generate the audio file due to a CUDA error that said I was missing almost 3GB of VRAM.
So I changed the following line in the import cell of the riffusion.ipynb file
pipe = DiffusionPipeline.from_pretrained("riffusion/riffusion-model-v1")
in the import cell of the riffusion/riffusion-model-v1 file
pipe = DiffusionPipeline.from_pretrained("riffusion/riffusion-model-v1",torch_dtype=torch.float16)
and further change #@title Define a Define a predict function to
width=768,
to
width=656,
I was able to create a music file of 6 s 550 ms on my GTX1060 with 6GB of VRAM.
The sound quality of the sound file I was able to create sounds the same as the sample from the official site.
By the way, no matter how small I made width=, VRAM was not enough, so changing the floating point precision was definitely effective.
Is there any problem with changing the floating point precision?
If not, I'd suggest defaulting to 16 floating point to save VRAM and allow more people to experience this wonderful Project.

Memory leak?

Riffusion jumped to several GB RAM on Firefox after using it for a few hours

Network error

After leaving a session on for a while, stopped getting new 5sec sounds due to CORS SOP errors ("Status code: 500"). Refreshing the page fixed it.

Feature: Add sound libraries to dataset

The intention is for the model to learn things like engine roaring or sink draining and use the characteristics of these in a song.

Sound libraries like The General Series 6000 Sound Effects Library have very descriptive names for their sounds thst are like captions.

Perhaps by prefixing the non-song elements in the dataset with "sfx" (or something else if that's not a single token) it won't impact the creation of rhythm.

error - DOMException [TimeoutError]: The operation was aborted due to timeout

Hi! Very cool project - attempting to run the inference server and the web app locally. The application seems to timeout when its generating the audio file. I've attached screenshots of my logs below, I'm running Node 19.2 as recommended in other issues threads.

RIFFUSION-APP

RIFFUSION-INFERENCE

localhost:3000

dataset info?

Hello! thanks for developing riffusion, it's a great step towards synesthesic creation. I was looking for information about the custom dataset built that makes riffusion possible. I have only found the authors information and the repeated info that it was created as a hobby.

Could you share more info on the dataset? By using it I have seen a great tendency towards organic sounds, the "song" format and more orthodox music, and I would love to create my own version with my own music -more noise focused. Could you share pointers towards that?

Thanks!

Sound issues after leaving on for long

After leaving Riffusion playing for a few hours, the sound seems to glitch or not work perfectly sometimes. Like it's receiving bad data from the server or times out or something. Especially around the times the alert box asking you if you're idle shows up

Issue with web app

I have set up the web app and the inference server and I can see the gui in localhost:3000 but I am getting this error...

Unhandled Runtime Error

SyntaxError: JSON.parse: unexpected character at line 1 column 1 of the JSON data

Anyone know what this could be? Thanks!

FYI - for further clarification I'm getting "INVALID_URL" in the web app CMD window

FYI2 - I believe the issue is I didn't add the .env.local file

"To configure these backends, add a .env.local file:"

I made the file and copied the code in pointing to the inference server....can someone just point me to where I need to place this file in the file structure? and is it .env or .env.local ...in the server file it says process.env so do I name it that?

Also does everything from the model repo go inside riffusion-inference-main/riffusion directory?? Thanks!

Cannont find module 'semver'

Looking forward to using this to make music and it looks fantastic.

Fresh Kubuntu 22.04 install. Installed riffusion-app according to the instructions and then changed the version of Node.js by;

sudo npm cache clean -f
sudo npm install -g n
sudo n v19.2.0

I did this because I was getting the same issue as #8

Now I get this error;

node:internal/modules/cjs/loader:1039
const err = new Error(message);
^

Error: Cannot find module 'semver'
Require stack:

/usr/share/nodejs/npm/lib/utils/unsupported.js
/usr/share/nodejs/npm/lib/cli.js
/usr/share/nodejs/npm/bin/npm-cli.js
at Module._resolveFilename (node:internal/modules/cjs/loader:1039:15)
at Module._load (node:internal/modules/cjs/loader:885:27)
at Module.require (node:internal/modules/cjs/loader:1105:19)
at require (node:internal/modules/cjs/helpers:103:18)
at Object. (/usr/share/nodejs/npm/lib/utils/unsupported.js:2:16)
at Module._compile (node:internal/modules/cjs/loader:1218:14)
at Module._extensions..js (node:internal/modules/cjs/loader:1272:10)
at Module.load (node:internal/modules/cjs/loader:1081:32)
at Module._load (node:internal/modules/cjs/loader:922:12)
at Module.require (node:internal/modules/cjs/loader:1105:19) {
code: 'MODULE_NOT_FOUND',
requireStack: [
'/usr/share/nodejs/npm/lib/utils/unsupported.js',
'/usr/share/nodejs/npm/lib/cli.js',
'/usr/share/nodejs/npm/bin/npm-cli.js'
]
}

Node.js v19.2.0

Add volume slider

cant generate music

i keep getting the error

error - TypeError: Failed to parse URL from undefined
    at Object.fetch (node:internal/deps/undici/undici:11576:11)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async handler (webpack-internal:///(api)/./pages/api/server.js:10:22)
    at async Object.apiResolver (C:\riffusion\riffusion-app\node_modules\next\dist\server\api-utils\node.js:367:9)
    at async DevServer.runApi (C:\riffusion\riffusion-app\node_modules\next\dist\server\next-server.js:474:9)
    at async Object.fn (C:\riffusion\riffusion-app\node_modules\next\dist\server\next-server.js:736:37)
    at async Router.execute (C:\riffusion\riffusion-app\node_modules\next\dist\server\router.js:252:36)
    at async DevServer.run (C:\riffusion\riffusion-app\node_modules\next\dist\server\base-server.js:384:29)
    at async DevServer.run (C:\riffusion\riffusion-app\node_modules\next\dist\server\dev\next-dev-server.js:732:20)
    at async DevServer.handleRequest (C:\riffusion\riffusion-app\node_modules\next\dist\server\base-server.js:322:20) {
  page: '/api/server',
  [cause]: TypeError [ERR_INVALID_URL]: Invalid URL
      at new NodeError (node:internal/errors:405:5)
      at new URL (node:internal/url:637:13)
      at new Request (node:internal/deps/undici/undici:7132:25)
      at fetch2 (node:internal/deps/undici/undici:10715:25)
      at Object.fetch (node:internal/deps/undici/undici:11574:18)
      at fetch (node:internal/process/pre_execution:242:25)
      at handler (webpack-internal:///(api)/./pages/api/server.js:10:28)
      at Object.apiResolver (C:\riffusion\riffusion-app\node_modules\next\dist\server\api-utils\node.js:367:15)
      at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
      at async DevServer.runApi (C:\riffusion\riffusion-app\node_modules\next\dist\server\next-server.js:474:9) {
    input: 'undefined',
    code: 'ERR_INVALID_URL'
  }
}

BPMs of the seed images

a prompt for a genre that is much faster BPM than the seed image will result in poor, generic audio.

what are the BPMs (and other characteristics) of the seed images? knowing what the seeds are leads to more accurate prompt pairs