riffusion / riffusion Goto Github PK

View Code? Open in Web Editor NEW

3.2K 38.0 367.0 8.25 MB

Stable diffusion for real-time music generation

Home Page: http://riffusion.com/about

License: MIT License

Python 100.00%

diffusion ai audio music diffusers stable-diffusion

riffusion's Introduction

🎸 Riffusion

Riffusion is a library for real-time music and audio generation with stable diffusion.

Read about it at https://www.riffusion.com/about and try it at https://www.riffusion.com/.

This is the core repository for riffusion image and audio processing code.

Diffusion pipeline that performs prompt interpolation combined with image conditioning
Conversions between spectrogram images and audio clips
Command-line interface for common tasks
Interactive app using streamlit
Flask server to provide model inference via API
Various third party integrations

Related repositories:

Web app: https://github.com/riffusion/riffusion-app
Model checkpoint: https://huggingface.co/riffusion/riffusion-model-v1

Citation

If you build on this work, please cite it as follows:

@article{Forsgren_Martiros_2022,
  author = {Forsgren, Seth* and Martiros, Hayk*},
  title = {{Riffusion - Stable diffusion for real-time music generation}},
  url = {https://riffusion.com/about},
  year = {2022}
}

Install

Tested in CI with Python 3.9 and 3.10.

It's highly recommended to set up a virtual Python environment with conda or virtualenv:

conda create --name riffusion python=3.9
conda activate riffusion

Install Python dependencies:

python -m pip install -r requirements.txt

In order to use audio formats other than WAV, ffmpeg is required.

sudo apt-get install ffmpeg          # linux
brew install ffmpeg                  # mac
conda install -c conda-forge ffmpeg  # conda

If torchaudio has no backend, you may need to install libsndfile. See this issue.

If you have an issue, try upgrading diffusers. Tested with 0.9 - 0.11.

Guides:

Simple Install Guide for Windows

Backends

CPU

cpu is supported but is quite slow.

CUDA

cuda is the recommended and most performant backend.

To use with CUDA, make sure you have torch and torchaudio installed with CUDA support. See the install guide or stable wheels.

To generate audio in real-time, you need a GPU that can run stable diffusion with approximately 50 steps in under five seconds, such as a 3090 or A10G.

Test availability with:

import torch
torch.cuda.is_available()

MPS

The mps backend on Apple Silicon is supported for inference but some operations fall back to CPU, particularly for audio processing. You may need to set PYTORCH_ENABLE_MPS_FALLBACK=1.

In addition, this backend is not deterministic.

Test availability with:

import torch
torch.backends.mps.is_available()

Command-line interface

Riffusion comes with a command line interface for performing common tasks.

See available commands:

python -m riffusion.cli -h

Get help for a specific command:

python -m riffusion.cli image-to-audio -h

Execute:

python -m riffusion.cli image-to-audio --image spectrogram_image.png --audio clip.wav

Riffusion Playground

Riffusion contains a streamlit app for interactive use and exploration.

Run with:

python -m riffusion.streamlit.playground

And access at http://127.0.0.1:8501/

Run the model server

Riffusion can be run as a flask server that provides inference via API. This server enables the web app to run locally.

Run with:

python -m riffusion.server --host 127.0.0.1 --port 3013

You can specify --checkpoint with your own directory or huggingface ID in diffusers format.

Use the --device argument to specify the torch device to use.

The model endpoint is now available at http://127.0.0.1:3013/run_inference via POST request.

Example input (see InferenceInput for the API):

{
  "alpha": 0.75,
  "num_inference_steps": 50,
  "seed_image_id": "og_beat",

  "start": {
    "prompt": "church bells on sunday",
    "seed": 42,
    "denoising": 0.75,
    "guidance": 7.0
  },

  "end": {
    "prompt": "jazz with piano",
    "seed": 123,
    "denoising": 0.75,
    "guidance": 7.0
  }
}

Example output (see InferenceOutput for the API):

{
  "image": "< base64 encoded JPEG image >",
  "audio": "< base64 encoded MP3 clip >"
}

Tests

Tests live in the test/ directory and are implemented with unittest.

To run all tests:

python -m unittest test/*_test.py

To run a single test:

python -m unittest test.audio_to_image_test

To preserve temporary outputs for debugging, set RIFFUSION_TEST_DEBUG:

RIFFUSION_TEST_DEBUG=1 python -m unittest test.audio_to_image_test

To run a single test case within a test:

python -m unittest test.audio_to_image_test -k AudioToImageTest.test_stereo

To run tests using a specific torch device, set RIFFUSION_TEST_DEVICE. Tests should pass with cpu, cuda, and mps backends.

Development Guide

Install additional packages for dev with python -m pip install -r requirements_dev.txt.

Linter: ruff
Formatter: black
Type checker: mypy

These are configured in pyproject.toml.

The results of mypy ., black ., and ruff . must be clean to accept a PR.

CI is run through GitHub Actions from .github/workflows/ci.yml.

Contributions are welcome through pull requests.

riffusion's People

Stargazers

Watchers

Forkers

saifrahmed pltnk 1r053 scf4 system1system2 goblincore hakanai-sc rfuisz ruohoruotsi dexterlagan ilayluz shaun95 chavinlo pasirikhard phi-line brevdev gharrower eylor matt-fff orehga zmzlois yuan-manx jerryrelmore chenchy marcus-arcadius liyucode paulmars cualquiercosa327 mshure maxmax2016 hbcbh1999 botkop centerqi mbrukman jaedukseo mgfrantz cxz mattt hayakzan basetenlabs jagilley james4ever0 gabrielvidal1 bradparks techthiyanes plachenko ishan-marikar suryatmodulus krehl viningr aliang-cv mattjwarren mklingen daanelson assassindesign towerdk2 laplacianaudio desined jprobichaud sl33pyc01e shreyas-kulkarni p00pcvm ailabteam mlaugharn a-leut kandy22 derekrmiller cclauss furmanlukasz hardsteppl sean-bailey nopeanuts auryd cyberflamego neuroidss ashbt deeplearn-art francis777 walkthroughwonder lasernite my-nonlinear-valentine greatfeel cruelpleasure stjordanis scando1993 boxabirds colinski cetrulin brucepro david20080125 fuciuss anoadragon453 niittymaa rfhoganiii danfinlay prescience-data heardsounds ddaying aman-tugnawat naotokui

riffusion's Issues

Consider a gradio interface for playground

Converting generated spectograms to audio

I can't figure out to build and run this with the given instructions but I am able to generate spectograms with the stable diffusion web ui. However I have no idea how to then convert it to audio with the included python script in here.

[Playground] Audio player shows error on Safari

[request] Remove checkpoints and LFS from the repositories

It would be much faster, as well as more readily working if they were uploaded to a file sharing site. The LFS files seem to always fail to download, and for some reason bluescreen my computer. I have seen others with similar problems downloading, and so I think this would help fix some of the troubleshooting issues people have

assert torch.cuda.is_available() AssertionError

I'm having trouble running this. I followed the install steps and entered the following and got the resulting assertion error. I have no trouble running stable diffusion locally, so I'm not sure what to do next here. Install a newer version of CUDA toolkit? My GPU is a 3090.

python -m riffusion.server --port 3013 --host 127.0.0.1 --checkpoint ./

Traceback (most recent call last):
File "C:\Users\username\anaconda3\envs\riffusion-inference\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\username\anaconda3\envs\riffusion-inference\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Users\username\Desktop\SD-GUI-1.3.1\riffusion\riffusion-inference-main\riffusion\server.py", line 215, in
argh.dispatch_command(run_app)
File "C:\Users\username\anaconda3\envs\riffusion-inference\lib\site-packages\argh\dispatching.py", line 306, in dispatch_command
dispatch(parser, *args, **kwargs)
File "C:\Users\username\anaconda3\envs\riffusion-inference\lib\site-packages\argh\dispatching.py", line 174, in dispatch
for line in lines:
File "C:\Users\username\anaconda3\envs\riffusion-inference\lib\site-packages\argh\dispatching.py", line 277, in _execute_command
for line in result:
File "C:\Users\username\anaconda3\envs\riffusion-inference\lib\site-packages\argh\dispatching.py", line 260, in _call
result = function(*positional, **keywords)
File "C:\Users\username\Desktop\SD-GUI-1.3.1\riffusion\riffusion-inference-main\riffusion\server.py", line 59, in run_app
MODEL = load_model(checkpoint=checkpoint)
File "C:\Users\username\Desktop\SD-GUI-1.3.1\riffusion\riffusion-inference-main\riffusion\server.py", line 79, in load_model
assert torch.cuda.is_available()
AssertionError

Ram requirements?

I have attempted to get this project running on an Ubuntu machine with a M40, as well as a windows machine with a 3090, and on both when I try to run the inference server it fills up system ram and then crashes. Both machines have 32gb of system ram.

It does not seem to be loading the gpu memory at all.

`(riffusion) PS C:\incoming\ml\riffusion-inference> python -m riffusion.server --port 3013 --host 127.0.0.1 --checkpoint .\riffusion-model-v1.ckpt
Traceback (most recent call last):
File "C:\Users\Meatfucker\anaconda3\envs\riffusion\lib\site-packages\diffusers\configuration_utils.py", line 380, in load_config
config_dict = cls._dict_from_json_file(config_file)
File "C:\Users\Meatfucker\anaconda3\envs\riffusion\lib\site-packages\diffusers\configuration_utils.py", line 480, in _dict_from_json_file
text = reader.read()
File "C:\Users\Meatfucker\anaconda3\envs\riffusion\lib\codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 64: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\Meatfucker\anaconda3\envs\riffusion\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\Meatfucker\anaconda3\envs\riffusion\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\incoming\ml\riffusion-inference\riffusion\server.py", line 215, in
argh.dispatch_command(run_app)
File "C:\Users\Meatfucker\anaconda3\envs\riffusion\lib\site-packages\argh\dispatching.py", line 306, in dispatch_command
dispatch(parser, *args, **kwargs)
File "C:\Users\Meatfucker\anaconda3\envs\riffusion\lib\site-packages\argh\dispatching.py", line 174, in dispatch
for line in lines:
File "C:\Users\Meatfucker\anaconda3\envs\riffusion\lib\site-packages\argh\dispatching.py", line 277, in _execute_command
for line in result:
File "C:\Users\Meatfucker\anaconda3\envs\riffusion\lib\site-packages\argh\dispatching.py", line 260, in _call
result = function(*positional, **keywords)
File "C:\incoming\ml\riffusion-inference\riffusion\server.py", line 59, in run_app
MODEL = load_model(checkpoint=checkpoint)
File "C:\incoming\ml\riffusion-inference\riffusion\server.py", line 81, in load_model
model = RiffusionPipeline.from_pretrained(
File "C:\Users\Meatfucker\anaconda3\envs\riffusion\lib\site-packages\diffusers\pipeline_utils.py", line 454, in from_pretrained
config_dict = cls.load_config(
File "C:\Users\Meatfucker\anaconda3\envs\riffusion\lib\site-packages\diffusers\configuration_utils.py", line 382, in load_config
raise EnvironmentError(f"It looks like the config file at '{config_file}' is not a valid JSON file.")
OSError: It looks like the config file at '.\riffusion-model-v1.ckpt' is not a valid JSON file.
(riffusion) PS C:\incoming\ml\riffusion-inference>`

can't use riffusion on my new system.

Using a 4090RTX is throwing me this error "nvrtc: error: invalid value for --gpu-architecture (-arch)" at the end of a listing what looks like source code displayed.

Can submit other system details if need be, but the nvrtc message at the end leads me to believe that my videocard is not supported (yet)?

Website: Generation doesn't work

I couldn't generate anything on the website with my prompts (Migos feat Gucci Mane drill, American platinum certified trap). It worked with a standard Eminem aggressive rap prompt, but it doesn't work with other standard prompts as well (e.g. post-teen pop talent show winner). I've waited for 30 minutes and got no result.

Enable hosted streamlit solutions

Hook up to streamlit cloud, huggingface, etc

File not found error

I seem awful close to getting this to run, but while trying the sample request:
`

import requests
import json
req = '{"alpha": 0.75,"num_inference_steps": 50,"seed_image_id": "og_beat","start": {"prompt": "church bells on sunday","seed": 42,"denoising": 0.75,"guidance": 7.0},"end": {"prompt": "jazz with piano","seed": 123,"denoising": 0.75,"guidance": 7.0}}'
response = requests.post('http://127.0.0.1:3013/run_inference', data=req)
`

I get the following error:
tokens: [[2735, 12811, 525, 1706]]
weights: [[1.0, 1.0, 1.0, 1.0]]
tokens: [[4528, 593, 7894]]
weights: [[1.0, 1.0, 1.0]]
100%|██████████████████████████████████████████████████████████████████████████████████| 38/38 [00:08<00:00, 4.69it/s]
ERROR:server:Exception on /run_inference/ [POST]
Traceback (most recent call last):
File "C:\Users\Blake\anaconda3\envs\riffusion-inference\lib\site-packages\flask\app.py", line 2525, in wsgi_app
response = self.full_dispatch_request()
File "C:\Users\Blake\anaconda3\envs\riffusion-inference\lib\site-packages\flask\app.py", line 1822, in full_dispatch_request
rv = self.handle_user_exception(e)
File "C:\Users\Blake\anaconda3\envs\riffusion-inference\lib\site-packages\flask_cors\extension.py", line 165, in wrapped_function
return cors_after_request(app.make_response(f(*args, **kwargs)))
File "C:\Users\Blake\anaconda3\envs\riffusion-inference\lib\site-packages\flask\app.py", line 1820, in full_dispatch_request
rv = self.dispatch_request()
File "C:\Users\Blake\anaconda3\envs\riffusion-inference\lib\site-packages\flask\app.py", line 1796, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
File "C:\Users\Blake\Code\Music\riffusion-inference\riffusion\server.py", line 146, in run_inference
response = compute(inputs)
File "C:\Users\Blake\Code\Music\riffusion-inference\riffusion\server.py", line 180, in compute
mp3_bytes = mp3_bytes_from_wav_bytes(wav_bytes)
File "C:\Users\Blake\Code\Music\riffusion-inference\riffusion\audio.py", line 183, in mp3_bytes_from_wav_bytes
sound.export(mp3_bytes, format="mp3")
File "C:\Users\Blake\anaconda3\envs\riffusion-inference\lib\site-packages\pydub\audio_segment.py", line 963, in export
p = subprocess.Popen(conversion_command, stdin=devnull, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
File "C:\Users\Blake\anaconda3\envs\riffusion-inference\lib\subprocess.py", line 951, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Users\Blake\anaconda3\envs\riffusion-inference\lib\subprocess.py", line 1420, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified

Given that the progress bar completes, I have a feeling it is something easy. Any ideas?

Install error: No audio backend is available

The installation instructions do not contain information on how to install an audio backend for torch audio, so an error is thrown on startup if not audio backend is installed:

https://stackoverflow.com/questions/62543843/cannot-import-torch-audio-no-audio-backend-is-available

Upload 4GB / 2GB pruned checkpoints HF

The huggingface has a 14gb model. Seems huge. Then there is this with a 2gb and 4gb ckpt. I ran the 2gb in auto1111 and got a "file may be malicious".

https://huggingface.co/ckpt/riffusion-model-v1/tree/main

I don't understand how the riffusion file for to be 14gb in the first place and then got down to 4/2gb. I'm confused. What ckpts is everybody using and are any of you using them in automatic1111? Thanks

[Playground] Fix rerunning computation outside of submit button on the first form

It can't find the checkpoint file

I put this in the commandline, with the env activated :
A:\riffusion-inference>python -m riffusion.server --port 3013 --host 127.0.0.1 --checkpoint "A:\riffusion-inference\Checkpoints\riffusion-model-v1.ckpt"

it then gets to this:

im sure theres something simple that I am missing, can anyone clue me in?

IndexError: list index out of range

Could use guidance on how to train

I have a highly curated data set with hundreds of thousands of wav files that I was using for experimenting with using a GAN to generate audio. Recently I created a technique where I translate machine generated data into natural language text for a Diffusion model.

I've been reading on the Discord server but I'm not 100% sure I got things figured out..

Here's what I'm thinking so far, please correct me where I am making mistakes.

Use the CLI to convert the wav files to spectrograms
Generate my text prompts for using my python machine data to natural language text script; save to json for Dreambooth training
Use the normal Dreambooth training process to fine tune on my audio data
Use the diffusion model to generate an spectrogram
Use the CLI to convert the spectrogram to a wav file

Things I'm unclear on

I'm not sure if this model can be imported into Riffusion
If the model can't be imported into Riffusion, I should be able to do inferencing in a notebook like Automatic1111
Unclear if ImgToImg, in & out painting can be used
-I have hundreds of thousands of wav files that I can label automatically, can/should I scale it this high, what is reasonable vs unreasonable

I have a new idea for a new music gen method, may I share it in this github repo?

Set up devcontainer for codespaces

ModuleNotFoundError: No module named 'riffusion.audio'

When running Google Colab https://colab.research.google.com/drive/1FhH3HlN8Ps_Pr9OR6Qcfbfz7utDvICl0?usp=sharing

In line:
from riffusion.audio import wav_bytes_from_spectrogram_image, spectrogram_from_waveform

I get:

ModuleNotFoundError: No module named 'riffusion.audio'

Finetuning guide and training code

Having a bit of trouble finding this, but I have a lot of ideas about how to finetune this and was looking for instructions on training. I think we can get to songs of any length pretty easily by converting the model to the modified one for outpainting and retraining on 512x1024.

Make RiffusersPipeline support txt2img and img2img without interpolation

Currently the app is loading a stock diffusers pipelien for this task

[Playground] Audio player shows error on Safari

Receiving error for final stitches version of audio in audio2audio. On iPhone. Prompted with a stereo mp3.

How to start inference server (flask) as Gunicorn service?

Hey,

I was wondering if it's possible to start the inference server as a gunicorn server?

Thanks!

Low-rank adaptation

How to convert audio to a spectrogram picture?

Maybe I don't know how to Google, but I only found how to visualize, which is not like what is shown in your article.

Package up for pypi to enable `pip install riffusion`

Try more advanced schedulers

For example this with 25 steps:

scheduler = DPMSolverMultistepScheduler(
    beta_start=0.00085,
    beta_end=0.012,
    beta_schedule="scaled_linear",
    num_train_timesteps=1000,
    trained_betas=None,
    predict_epsilon=True,
    thresholding=False,
    algorithm_type="dpmsolver++",
    solver_type="midpoint",
    lower_order_final=True,
)

Baseten instructions

Hey, sorry but I'm failing to understand how to setup baseten for use with riffusion. Is it possible to add some steps?

Data preparing process

Hi, Thanks for your wonderful work!
I tried to use audio.py to create the Mel spectrogram of size 512*512, but the Mel spectrograms are badly created. Could you please share more parameters information like n_fft, hop_size, win_size and etc?

I tried to use n_fft = 1024, hop_length = 256, win_length=1024 to extract melspec with n_mels = 512, but the picture is like that````

What data was used to train riffusion?

Super cool work! Could you provide more information about the data source used to train riffusion?

For example:

How much audio data did you use for finetuning?
What's the source of the audio data?
How did you get the paired text with the audio? (e.g. is it extracted from the tags or music description?)

Thanks so much!

Add audio2audio demo

Undefined name 'prompt' in riffusion_pipeline.py

% flake8 . --count --select=E9,F63,F7,F82,Y --show-source --statistics

./riffusion/riffusion_pipeline.py:201:23: F821 undefined name 'prompt'
            elif type(prompt) is not type(negative_prompt):
                      ^
./riffusion/riffusion_pipeline.py:204:30: F821 undefined name 'prompt'
                    f" {type(prompt)}."
                             ^
2     F821 undefined name 'prompt'
2

Support for Apple Silicon / MPS

I’m not sure if current gen Apple Silicon GPUs are capable of doing the computation fast enough (probably not tbh) but it would be great to get it working so folks can at least try it out. I tried changing all the mentions of cuda in the project to mps, but I’m getting an error in TensorScript which suggests some changes need to be made to the model to not assume CUDA. Is there a way to fix/patch this?

NotImplementedError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__/diffusers/models/unet_2d_condition/___torch_mangle_4939.py", line 44, in forward
    _4 = ops.prim.NumToTensor(torch.size(sample, 0))
    timesteps = torch.expand(timestep, [int(_4)])
    input0 = torch.to((time_proj).forward(timesteps, ), 5)
                       ~~~~~~~~~~~~~~~~~~ <--- HERE
    _5 = (time_embedding).forward(input0, )
    _6 = (conv_in).forward(sample, )
  File "code/__torch__/diffusers/models/embeddings/___torch_mangle_4232.py", line 8, in forward
  def forward(self: __torch__.diffusers.models.embeddings.___torch_mangle_4232.Timesteps,
    timesteps: Tensor) -> Tensor:
    _0 = torch.arange(0, 160, dtype=6, layout=None, device=torch.device("cuda:0"), pin_memory=False)
         ~~~~~~~~~~~~ <--- HERE
    exponent = torch.mul(_0, CONSTANTS.c0)
    exponent0 = torch.div(exponent, CONSTANTS.c1)

Out of 4GiB VRAM memory at run server

Installation Guide for Riffusion App & Inference Server on Windows. After command python -m riffusion.server --port 3013 --host 127.0.0.1 :

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ C:\ProgramData\Anaconda3\envs\riffusion-inference\lib\runpy.py:197 in _run_module_as_main │
│ │
│ 194 │ main_globals = sys.modules["main"].dict │
│ 195 │ if alter_argv: │
│ 196 │ │ sys.argv[0] = mod_spec.origin │
│ ❱ 197 │ return _run_code(code, main_globals, None, │
│ 198 │ │ │ │ │ "main", mod_spec) │
│ 199 │
│ 200 def run_module(mod_name, init_globals=None, │
│ │
│ C:\ProgramData\Anaconda3\envs\riffusion-inference\lib\runpy.py:87 in _run_code │
│ │
│ 84 │ │ │ │ │ loader = loader, │
│ 85 │ │ │ │ │ package = pkg_name, │
│ 86 │ │ │ │ │ spec = mod_spec) │
│ ❱ 87 │ exec(code, run_globals) │
│ 88 │ return run_globals │
│ 89 │
│ 90 def _run_module_code(code, init_globals=None, │
│ │
│ C:\TheAiWork\Riffusion\riffusion-inference\riffusion\server.py:189 in │
│ │
│ 186 if name == "main": │
│ 187 │ import argh │
│ 188 │ │
│ ❱ 189 │ argh.dispatch_command(run_app) │
│ 190 │
│ │
│ C:\ProgramData\Anaconda3\envs\riffusion-inference\lib\site-packages\argh\dispatching.py:306 in │
│ dispatch_command │
│ │
│ 303 │ """ │
│ 304 │ parser = argparse.ArgumentParser(formatter_class=PARSER_FORMATTER) │
│ 305 │ set_default_command(parser, function) │
│ ❱ 306 │ dispatch(parser, *args, **kwargs) │
│ 307 │
│ 308 │
│ 309 def dispatch_commands(functions, *args, **kwargs): │
│ │
│ C:\ProgramData\Anaconda3\envs\riffusion-inference\lib\site-packages\argh\dispatching.py:174 in │
│ dispatch │
│ │
│ 171 │ │ # normally this is stdout; can be any file │
│ 172 │ │ f = output_file │
│ 173 │ │
│ ❱ 174 │ for line in lines: │
│ 175 │ │ # print the line as soon as it is generated to ensure that it is │
│ 176 │ │ # displayed to the user before anything else happens, e.g. │
│ 177 │ │ # raw_input() is called │
│ │
│ C:\ProgramData\Anaconda3\envs\riffusion-inference\lib\site-packages\argh\dispatching.py:277 in │
│ _execute_command │
│ │
│ 274 │ │
│ 275 │ try: │
│ 276 │ │ result = _call() │
│ ❱ 277 │ │ for line in result: │
│ 278 │ │ │ yield line │
│ 279 │ except tuple(wrappable_exceptions) as e: │
│ 280 │ │ processor = getattr(function, ATTR_WRAPPED_EXCEPTIONS_PROCESSOR, │
│ │
│ C:\ProgramData\Anaconda3\envs\riffusion-inference\lib\site-packages\argh\dispatching.py:260 in │
│ _call │
│ │
│ 257 │ │ │ │ │ │ continue │
│ 258 │ │ │ │ │ keywords[k] = getattr(namespace_obj, k) │
│ 259 │ │ │ │
│ ❱ 260 │ │ │ result = function(*positional, **keywords) │
│ 261 │ │ │
│ 262 │ │ # Yield the results │
│ 263 │ │ if isinstance(result, (GeneratorType, list, tuple)): │
│ │
│ C:\TheAiWork\Riffusion\riffusion-inference\riffusion\server.py:55 in run_app │
│ │
│ 52 │ """ │
│ 53 │ # Initialize the model │
│ 54 │ global PIPELINE │
│ ❱ 55 │ PIPELINE = RiffusionPipeline.load_checkpoint( │
│ 56 │ │ checkpoint=checkpoint, │
│ 57 │ │ use_traced_unet=not no_traced_unet, │
│ 58 │ │ device=device, │
│ │
│ C:\TheAiWork\Riffusion\riffusion-inference\riffusion\riffusion_pipeline.py:109 in │
│ load_checkpoint │
│ │
│ 106 │ │ │
│ 107 │ │ # Optionally load a traced unet │
│ 108 │ │ if checkpoint == "riffusion/riffusion-model-v1" and use_traced_unet: │
│ ❱ 109 │ │ │ traced_unet = cls.load_traced_unet( │
│ 110 │ │ │ │ checkpoint=checkpoint, │
│ 111 │ │ │ │ subfolder="unet_traced", │
│ 112 │ │ │ │ filename="unet_traced.pt", │
│ │
│ C:\TheAiWork\Riffusion\riffusion-inference\riffusion\riffusion_pipeline.py:153 in │
│ load_traced_unet │
│ │
│ 150 │ │ │ local_files_only=local_files_only, │
│ 151 │ │ │ cache_dir=cache_dir, │
│ 152 │ │ ) │
│ ❱ 153 │ │ unet_traced = torch.jit.load(unet_file) │
│ 154 │ │ │
│ 155 │ │ # Wrap it in a torch module │
│ 156 │ │ class TracedUNet(torch.nn.Module): │
│ │
│ C:\ProgramData\Anaconda3\envs\riffusion-inference\lib\site-packages\torch\jit_serialization.py: │
│ 162 in load │
│ │
│ 159 │ │
│ 160 │ cu = torch._C.CompilationUnit() │
│ 161 │ if isinstance(f, str) or isinstance(f, pathlib.Path): │
│ ❱ 162 │ │ cpp_module = torch._C.import_ir_module(cu, str(f), map_location, _extra_files) │
│ 163 │ else: │
│ 164 │ │ cpp_module = torch._C.import_ir_module_from_buffer( │
│ 165 │ │ │ cu, f.read(), map_location, _extra_files │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯

OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 4.00 GiB total capacity; 3.39 GiB already
allocated; 0 bytes free; 3.47 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting
max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Fine tuning the model possible?

It would be great if there was a simple path for fine-tuning the model. Much like what dance diffusion offers.

Support streamlit app / CLI over API with third party integration token

Prompt-to-prompt / null text inversion

Plug-and-Play Diffusion

[Playground] RuntimeError: "cos_vml_cpu" not implemented for 'Half'

2023-01-07 01:34:56.643 Uncaught app exception
Traceback (most recent call last):
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/streamlit/runtime/caching/cache_utils.py", line 352, in get_or_create_cached_value
    result = cache.read_result(value_key)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/streamlit/runtime/caching/memo_decorator.py", line 461, in read_result
    raise e
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/streamlit/runtime/caching/memo_decorator.py", line 454, in read_result
    pickled_entry = self._read_from_mem_cache(key)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/streamlit/runtime/caching/memo_decorator.py", line 552, in _read_from_mem_cache
    raise CacheKeyNotFoundError("Key not found in mem cache")
streamlit.runtime.caching.cache_errors.CacheKeyNotFoundError: Key not found in mem cache

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/streamlit/runtime/caching/cache_utils.py", line 352, in get_or_create_cached_value
    result = cache.read_result(value_key)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/streamlit/runtime/caching/memo_decorator.py", line 461, in read_result
    raise e
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/streamlit/runtime/caching/memo_decorator.py", line 454, in read_result
    pickled_entry = self._read_from_mem_cache(key)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/streamlit/runtime/caching/memo_decorator.py", line 552, in _read_from_mem_cache
    raise CacheKeyNotFoundError("Key not found in mem cache")
streamlit.runtime.caching.cache_errors.CacheKeyNotFoundError: Key not found in mem cache

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/streamlit/runtime/caching/cache_utils.py", line 352, in get_or_create_cached_value
    result = cache.read_result(value_key)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/streamlit/runtime/caching/singleton_decorator.py", line 313, in read_result
    raise CacheKeyNotFoundError()
streamlit.runtime.caching.cache_errors.CacheKeyNotFoundError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 564, in _run_script
    exec(code, module.__dict__)
  File "/home/user/app/app.py", line 36, in 
    render_main()
  File "/home/user/app/app.py", line 33, in render_main
    render_func()
  File "/home/user/app/riffusion/riffusion/streamlit/pages/text_to_audio.py", line 65, in render_text_to_audio
    audio_bytes = streamlit_util.audio_bytes_from_spectrogram_image(
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/streamlit/runtime/caching/cache_utils.py", line 400, in wrapper
    return get_or_create_cached_value()
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/streamlit/runtime/caching/cache_utils.py", line 373, in get_or_create_cached_value
    return_value = func(*args, **kwargs)
  File "/home/user/app/riffusion/riffusion/streamlit/util.py", line 150, in audio_bytes_from_spectrogram_image
    segment = audio_segment_from_spectrogram_image(image=image, params=params, device=device)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/streamlit/runtime/caching/cache_utils.py", line 400, in wrapper
    return get_or_create_cached_value()
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/streamlit/runtime/caching/cache_utils.py", line 373, in get_or_create_cached_value
    return_value = func(*args, **kwargs)
  File "/home/user/app/riffusion/riffusion/streamlit/util.py", line 139, in audio_segment_from_spectrogram_image
    converter = spectrogram_image_converter(params=params, device=device)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/streamlit/runtime/caching/cache_utils.py", line 400, in wrapper
    return get_or_create_cached_value()
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/streamlit/runtime/caching/cache_utils.py", line 373, in get_or_create_cached_value
    return_value = func(*args, **kwargs)
  File "/home/user/app/riffusion/riffusion/streamlit/util.py", line 120, in spectrogram_image_converter
    return SpectrogramImageConverter(params=params, device=device)
  File "/home/user/app/riffusion/riffusion/spectrogram_image_converter.py", line 21, in __init__
    self.converter = SpectrogramConverter(params=params, device=device)
  File "/home/user/app/riffusion/riffusion/spectrogram_converter.py", line 47, in __init__
    self.spectrogram_func = torchaudio.transforms.Spectrogram(
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/torchaudio/transforms/_transforms.py", line 83, in __init__
    window = window_fn(self.win_length) if wkwargs is None else window_fn(self.win_length, **wkwargs)
RuntimeError: "cos_vml_cpu" not implemented for 'Half'

New Google Colab Implementation for v0.3.0

Here's a new, working google colab document. Feel free to edit, copy or use as you like: https://colab.research.google.com/drive/1BaFEWxzYCI_YKi0uxtHyegIbpD-OCRNf?usp=sharing

Note that I had to apply a temporary pin for streamlit to 1.15.2 to work around streamlit/streamlit#5867. That can be removed once streamlit has fixed their dependency chain.

Ensure that tests pass cleanly without ffmpeg installed

Should skip any non-wav processing gracefully. Test this by removing avconv install from actions.

Try imagic

[Playground] Batch `audio2audio` page

I've got some WIP code for a Streamlit page batching audio2audio using a random seed each time, just wanted to check if you'd be interested in a PR to add this as a page to the Playground?

Use large batch size for large GPUs

Big gains to be had for batch processing:

rriffusion.cli image-to-audio fails with - ValueError: axes don't match array

Hello,

Thanks for this great work!

I am trying to run a basic script, taking a seed image provided in the repository:
python -m riffusion.cli image-to-audio --image ./seed_images/og_beat.png --audio clip.wav

however it fails with the following stack (I added some prints to further investigate):

<PIL.PngImagePlugin.PngImageFile image mode=P size=512x512 at 0x7FC22A2CC610> WARNING: Could not find spectrogram parameters in exif data. Using defaults. <PIL.Image.Image image mode=P size=512x512 at 0x7FC236F01C10> np.array(image): [[59 64 61 ... 59 64 61] [55 64 64 ... 55 64 64] [35 49 61 ... 35 49 61] ... [45 52 52 ... 45 52 52] [62 2 65 ... 62 2 65] [ 0 1 2 ... 0 1 2]] Traceback (most recent call last): File "/opt/conda/envs/riffusion/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/opt/conda/envs/riffusion/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/ubuntu/Sefi/riffusion/riffusion/cli.py", line 133, in <module> argh.dispatch_commands( File "/opt/conda/envs/riffusion/lib/python3.9/site-packages/argh/dispatching.py", line 328, in dispatch_commands dispatch(parser, *args, **kwargs) File "/opt/conda/envs/riffusion/lib/python3.9/site-packages/argh/dispatching.py", line 174, in dispatch for line in lines: File "/opt/conda/envs/riffusion/lib/python3.9/site-packages/argh/dispatching.py", line 277, in _execute_command for line in result: File "/opt/conda/envs/riffusion/lib/python3.9/site-packages/argh/dispatching.py", line 260, in _call result = function(*positional, **keywords) File "/home/ubuntu/Sefi/riffusion/riffusion/cli.py", line 88, in image_to_audio segment = converter.audio_from_spectrogram_image(pil_image) File "/home/ubuntu/Sefi/riffusion/riffusion/spectrogram_image_converter.py", line 79, in audio_from_spectrogram_image spectrogram = image_util.spectrogram_from_image( File "/home/ubuntu/Sefi/riffusion/riffusion/util/image_util.py", line 88, in spectrogram_from_image data = np.array(image).transpose(2, 0, 1) ValueError: axes don't match array

What am I missing?

[Playground] Index error on simultaneous access in audio_to_audio

cc @SethForsgren

[AudioSplitter] Throws `WinError 2` when run

Not sure if this is a missing dependency or a Windows specific issue, but when I attempt to run the splitter my local throws:

Traceback (most recent call last):
  File "C:\Users\***\anaconda3\envs\riffusion\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 565, in _run_script
    exec(code, module.__dict__)
  File "D:\StableDiffusion\riffusion\riffusion\streamlit\pages\split_audio.py", line 80, in <module>
    render_split_audio()
  File "D:\StableDiffusion\riffusion\riffusion\streamlit\pages\split_audio.py", line 57, in render_split_audio
    stems = split_audio(segment, device=device)
  File "D:\StableDiffusion\riffusion\riffusion\audio_splitter.py", line 51, in split_audio
    subprocess.run(
  File "C:\Users\***\anaconda3\envs\riffusion\lib\subprocess.py", line 505, in run
    with Popen(*popenargs, **kwargs) as process:
  File "C:\Users\***\anaconda3\envs\riffusion\lib\subprocess.py", line 951, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\Users\***\anaconda3\envs\riffusion\lib\subprocess.py", line 1420, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified