GithubHelp home page GithubHelp logo

riffusion / riffusion Goto Github PK

View Code? Open in Web Editor NEW
3.2K 38.0 367.0 8.25 MB

Stable diffusion for real-time music generation

Home Page: http://riffusion.com/about

License: MIT License

Python 100.00%
diffusion ai audio music diffusers stable-diffusion

riffusion's Introduction

๐ŸŽธ Riffusion

CI status Python 3.9 | 3.10 MIT License

Riffusion is a library for real-time music and audio generation with stable diffusion.

Read about it at https://www.riffusion.com/about and try it at https://www.riffusion.com/.

This is the core repository for riffusion image and audio processing code.

  • Diffusion pipeline that performs prompt interpolation combined with image conditioning
  • Conversions between spectrogram images and audio clips
  • Command-line interface for common tasks
  • Interactive app using streamlit
  • Flask server to provide model inference via API
  • Various third party integrations

Related repositories:

Citation

If you build on this work, please cite it as follows:

@article{Forsgren_Martiros_2022,
  author = {Forsgren, Seth* and Martiros, Hayk*},
  title = {{Riffusion - Stable diffusion for real-time music generation}},
  url = {https://riffusion.com/about},
  year = {2022}
}

Install

Tested in CI with Python 3.9 and 3.10.

It's highly recommended to set up a virtual Python environment with conda or virtualenv:

conda create --name riffusion python=3.9
conda activate riffusion

Install Python dependencies:

python -m pip install -r requirements.txt

In order to use audio formats other than WAV, ffmpeg is required.

sudo apt-get install ffmpeg          # linux
brew install ffmpeg                  # mac
conda install -c conda-forge ffmpeg  # conda

If torchaudio has no backend, you may need to install libsndfile. See this issue.

If you have an issue, try upgrading diffusers. Tested with 0.9 - 0.11.

Guides:

Backends

CPU

cpu is supported but is quite slow.

CUDA

cuda is the recommended and most performant backend.

To use with CUDA, make sure you have torch and torchaudio installed with CUDA support. See the install guide or stable wheels.

To generate audio in real-time, you need a GPU that can run stable diffusion with approximately 50 steps in under five seconds, such as a 3090 or A10G.

Test availability with:

import torch
torch.cuda.is_available()

MPS

The mps backend on Apple Silicon is supported for inference but some operations fall back to CPU, particularly for audio processing. You may need to set PYTORCH_ENABLE_MPS_FALLBACK=1.

In addition, this backend is not deterministic.

Test availability with:

import torch
torch.backends.mps.is_available()

Command-line interface

Riffusion comes with a command line interface for performing common tasks.

See available commands:

python -m riffusion.cli -h

Get help for a specific command:

python -m riffusion.cli image-to-audio -h

Execute:

python -m riffusion.cli image-to-audio --image spectrogram_image.png --audio clip.wav

Riffusion Playground

Riffusion contains a streamlit app for interactive use and exploration.

Run with:

python -m riffusion.streamlit.playground

And access at http://127.0.0.1:8501/

Riffusion Playground

Run the model server

Riffusion can be run as a flask server that provides inference via API. This server enables the web app to run locally.

Run with:

python -m riffusion.server --host 127.0.0.1 --port 3013

You can specify --checkpoint with your own directory or huggingface ID in diffusers format.

Use the --device argument to specify the torch device to use.

The model endpoint is now available at http://127.0.0.1:3013/run_inference via POST request.

Example input (see InferenceInput for the API):

{
  "alpha": 0.75,
  "num_inference_steps": 50,
  "seed_image_id": "og_beat",

  "start": {
    "prompt": "church bells on sunday",
    "seed": 42,
    "denoising": 0.75,
    "guidance": 7.0
  },

  "end": {
    "prompt": "jazz with piano",
    "seed": 123,
    "denoising": 0.75,
    "guidance": 7.0
  }
}

Example output (see InferenceOutput for the API):

{
  "image": "< base64 encoded JPEG image >",
  "audio": "< base64 encoded MP3 clip >"
}

Tests

Tests live in the test/ directory and are implemented with unittest.

To run all tests:

python -m unittest test/*_test.py

To run a single test:

python -m unittest test.audio_to_image_test

To preserve temporary outputs for debugging, set RIFFUSION_TEST_DEBUG:

RIFFUSION_TEST_DEBUG=1 python -m unittest test.audio_to_image_test

To run a single test case within a test:

python -m unittest test.audio_to_image_test -k AudioToImageTest.test_stereo

To run tests using a specific torch device, set RIFFUSION_TEST_DEVICE. Tests should pass with cpu, cuda, and mps backends.

Development Guide

Install additional packages for dev with python -m pip install -r requirements_dev.txt.

  • Linter: ruff
  • Formatter: black
  • Type checker: mypy

These are configured in pyproject.toml.

The results of mypy ., black ., and ruff . must be clean to accept a PR.

CI is run through GitHub Actions from .github/workflows/ci.yml.

Contributions are welcome through pull requests.

riffusion's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

riffusion's Issues

Converting generated spectograms to audio

I can't figure out to build and run this with the given instructions but I am able to generate spectograms with the stable diffusion web ui. However I have no idea how to then convert it to audio with the included python script in here.

[request] Remove checkpoints and LFS from the repositories

It would be much faster, as well as more readily working if they were uploaded to a file sharing site. The LFS files seem to always fail to download, and for some reason bluescreen my computer. I have seen others with similar problems downloading, and so I think this would help fix some of the troubleshooting issues people have

assert torch.cuda.is_available() AssertionError

I'm having trouble running this. I followed the install steps and entered the following and got the resulting assertion error. I have no trouble running stable diffusion locally, so I'm not sure what to do next here. Install a newer version of CUDA toolkit? My GPU is a 3090.

python -m riffusion.server --port 3013 --host 127.0.0.1 --checkpoint ./

Traceback (most recent call last):
File "C:\Users\username\anaconda3\envs\riffusion-inference\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\username\anaconda3\envs\riffusion-inference\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Users\username\Desktop\SD-GUI-1.3.1\riffusion\riffusion-inference-main\riffusion\server.py", line 215, in
argh.dispatch_command(run_app)
File "C:\Users\username\anaconda3\envs\riffusion-inference\lib\site-packages\argh\dispatching.py", line 306, in dispatch_command
dispatch(parser, *args, **kwargs)
File "C:\Users\username\anaconda3\envs\riffusion-inference\lib\site-packages\argh\dispatching.py", line 174, in dispatch
for line in lines:
File "C:\Users\username\anaconda3\envs\riffusion-inference\lib\site-packages\argh\dispatching.py", line 277, in _execute_command
for line in result:
File "C:\Users\username\anaconda3\envs\riffusion-inference\lib\site-packages\argh\dispatching.py", line 260, in _call
result = function(*positional, **keywords)
File "C:\Users\username\Desktop\SD-GUI-1.3.1\riffusion\riffusion-inference-main\riffusion\server.py", line 59, in run_app
MODEL = load_model(checkpoint=checkpoint)
File "C:\Users\username\Desktop\SD-GUI-1.3.1\riffusion\riffusion-inference-main\riffusion\server.py", line 79, in load_model
assert torch.cuda.is_available()
AssertionError

Ram requirements?

I have attempted to get this project running on an Ubuntu machine with a M40, as well as a windows machine with a 3090, and on both when I try to run the inference server it fills up system ram and then crashes. Both machines have 32gb of system ram.

It does not seem to be loading the gpu memory at all.

`(riffusion) PS C:\incoming\ml\riffusion-inference> python -m riffusion.server --port 3013 --host 127.0.0.1 --checkpoint .\riffusion-model-v1.ckpt
Traceback (most recent call last):
File "C:\Users\Meatfucker\anaconda3\envs\riffusion\lib\site-packages\diffusers\configuration_utils.py", line 380, in load_config
config_dict = cls._dict_from_json_file(config_file)
File "C:\Users\Meatfucker\anaconda3\envs\riffusion\lib\site-packages\diffusers\configuration_utils.py", line 480, in _dict_from_json_file
text = reader.read()
File "C:\Users\Meatfucker\anaconda3\envs\riffusion\lib\codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 64: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\Meatfucker\anaconda3\envs\riffusion\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\Meatfucker\anaconda3\envs\riffusion\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\incoming\ml\riffusion-inference\riffusion\server.py", line 215, in
argh.dispatch_command(run_app)
File "C:\Users\Meatfucker\anaconda3\envs\riffusion\lib\site-packages\argh\dispatching.py", line 306, in dispatch_command
dispatch(parser, *args, **kwargs)
File "C:\Users\Meatfucker\anaconda3\envs\riffusion\lib\site-packages\argh\dispatching.py", line 174, in dispatch
for line in lines:
File "C:\Users\Meatfucker\anaconda3\envs\riffusion\lib\site-packages\argh\dispatching.py", line 277, in _execute_command
for line in result:
File "C:\Users\Meatfucker\anaconda3\envs\riffusion\lib\site-packages\argh\dispatching.py", line 260, in _call
result = function(*positional, **keywords)
File "C:\incoming\ml\riffusion-inference\riffusion\server.py", line 59, in run_app
MODEL = load_model(checkpoint=checkpoint)
File "C:\incoming\ml\riffusion-inference\riffusion\server.py", line 81, in load_model
model = RiffusionPipeline.from_pretrained(
File "C:\Users\Meatfucker\anaconda3\envs\riffusion\lib\site-packages\diffusers\pipeline_utils.py", line 454, in from_pretrained
config_dict = cls.load_config(
File "C:\Users\Meatfucker\anaconda3\envs\riffusion\lib\site-packages\diffusers\configuration_utils.py", line 382, in load_config
raise EnvironmentError(f"It looks like the config file at '{config_file}' is not a valid JSON file.")
OSError: It looks like the config file at '.\riffusion-model-v1.ckpt' is not a valid JSON file.
(riffusion) PS C:\incoming\ml\riffusion-inference>`

can't use riffusion on my new system.

Using a 4090RTX is throwing me this error "nvrtc: error: invalid value for --gpu-architecture (-arch)" at the end of a listing what looks like source code displayed.

Can submit other system details if need be, but the nvrtc message at the end leads me to believe that my videocard is not supported (yet)?

Website: Generation doesn't work

I couldn't generate anything on the website with my prompts (Migos feat Gucci Mane drill, American platinum certified trap). It worked with a standard Eminem aggressive rap prompt, but it doesn't work with other standard prompts as well (e.g. post-teen pop talent show winner). I've waited for 30 minutes and got no result.

File not found error

I seem awful close to getting this to run, but while trying the sample request:
`

import requests
import json
req = '{"alpha": 0.75,"num_inference_steps": 50,"seed_image_id": "og_beat","start": {"prompt": "church bells on sunday","seed": 42,"denoising": 0.75,"guidance": 7.0},"end": {"prompt": "jazz with piano","seed": 123,"denoising": 0.75,"guidance": 7.0}}'
response = requests.post('http://127.0.0.1:3013/run_inference', data=req)
`

I get the following error:
tokens: [[2735, 12811, 525, 1706]]
weights: [[1.0, 1.0, 1.0, 1.0]]
tokens: [[4528, 593, 7894]]
weights: [[1.0, 1.0, 1.0]]
100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 38/38 [00:08<00:00, 4.69it/s]
ERROR:server:Exception on /run_inference/ [POST]
Traceback (most recent call last):
File "C:\Users\Blake\anaconda3\envs\riffusion-inference\lib\site-packages\flask\app.py", line 2525, in wsgi_app
response = self.full_dispatch_request()
File "C:\Users\Blake\anaconda3\envs\riffusion-inference\lib\site-packages\flask\app.py", line 1822, in full_dispatch_request
rv = self.handle_user_exception(e)
File "C:\Users\Blake\anaconda3\envs\riffusion-inference\lib\site-packages\flask_cors\extension.py", line 165, in wrapped_function
return cors_after_request(app.make_response(f(*args, **kwargs)))
File "C:\Users\Blake\anaconda3\envs\riffusion-inference\lib\site-packages\flask\app.py", line 1820, in full_dispatch_request
rv = self.dispatch_request()
File "C:\Users\Blake\anaconda3\envs\riffusion-inference\lib\site-packages\flask\app.py", line 1796, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
File "C:\Users\Blake\Code\Music\riffusion-inference\riffusion\server.py", line 146, in run_inference
response = compute(inputs)
File "C:\Users\Blake\Code\Music\riffusion-inference\riffusion\server.py", line 180, in compute
mp3_bytes = mp3_bytes_from_wav_bytes(wav_bytes)
File "C:\Users\Blake\Code\Music\riffusion-inference\riffusion\audio.py", line 183, in mp3_bytes_from_wav_bytes
sound.export(mp3_bytes, format="mp3")
File "C:\Users\Blake\anaconda3\envs\riffusion-inference\lib\site-packages\pydub\audio_segment.py", line 963, in export
p = subprocess.Popen(conversion_command, stdin=devnull, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
File "C:\Users\Blake\anaconda3\envs\riffusion-inference\lib\subprocess.py", line 951, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Users\Blake\anaconda3\envs\riffusion-inference\lib\subprocess.py", line 1420, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified

Given that the progress bar completes, I have a feeling it is something easy. Any ideas?

It can't find the checkpoint file

I put this in the commandline, with the env activated :
A:\riffusion-inference>python -m riffusion.server --port 3013 --host 127.0.0.1 --checkpoint "A:\riffusion-inference\Checkpoints\riffusion-model-v1.ckpt"

it then gets to this:
image
im sure theres something simple that I am missing, can anyone clue me in?

Could use guidance on how to train

I have a highly curated data set with hundreds of thousands of wav files that I was using for experimenting with using a GAN to generate audio. Recently I created a technique where I translate machine generated data into natural language text for a Diffusion model.

I've been reading on the Discord server but I'm not 100% sure I got things figured out..

Here's what I'm thinking so far, please correct me where I am making mistakes.

  • Use the CLI to convert the wav files to spectrograms
  • Generate my text prompts for using my python machine data to natural language text script; save to json for Dreambooth training
  • Use the normal Dreambooth training process to fine tune on my audio data
  • Use the diffusion model to generate an spectrogram
  • Use the CLI to convert the spectrogram to a wav file

Things I'm unclear on

  • I'm not sure if this model can be imported into Riffusion
  • If the model can't be imported into Riffusion, I should be able to do inferencing in a notebook like Automatic1111
  • Unclear if ImgToImg, in & out painting can be used
    -I have hundreds of thousands of wav files that I can label automatically, can/should I scale it this high, what is reasonable vs unreasonable

Finetuning guide and training code

Having a bit of trouble finding this, but I have a lot of ideas about how to finetune this and was looking for instructions on training. I think we can get to songs of any length pretty easily by converting the model to the modified one for outpainting and retraining on 512x1024.

Try more advanced schedulers

For example this with 25 steps:

scheduler = DPMSolverMultistepScheduler(
    beta_start=0.00085,
    beta_end=0.012,
    beta_schedule="scaled_linear",
    num_train_timesteps=1000,
    trained_betas=None,
    predict_epsilon=True,
    thresholding=False,
    algorithm_type="dpmsolver++",
    solver_type="midpoint",
    lower_order_final=True,
)

Baseten instructions

Hey, sorry but I'm failing to understand how to setup baseten for use with riffusion. Is it possible to add some steps?

Data preparing process

Hi, Thanks for your wonderful work!
I tried to use audio.py to create the Mel spectrogram of size 512*512, but the Mel spectrograms are badly created. Could you please share more parameters information like n_fft, hop_size, win_size and etc?

I tried to use n_fft = 1024, hop_length = 256, win_length=1024 to extract melspec with n_mels = 512, but the picture is like that````
rqfQRErjfk8

What data was used to train riffusion?

Super cool work! Could you provide more information about the data source used to train riffusion?

For example:

  • How much audio data did you use for finetuning?
  • What's the source of the audio data?
  • How did you get the paired text with the audio? (e.g. is it extracted from the tags or music description?)

Thanks so much!

Undefined name 'prompt' in riffusion_pipeline.py

% flake8 . --count --select=E9,F63,F7,F82,Y --show-source --statistics

./riffusion/riffusion_pipeline.py:201:23: F821 undefined name 'prompt'
            elif type(prompt) is not type(negative_prompt):
                      ^
./riffusion/riffusion_pipeline.py:204:30: F821 undefined name 'prompt'
                    f" {type(prompt)}."
                             ^
2     F821 undefined name 'prompt'
2

Support for Apple Silicon / MPS

Iโ€™m not sure if current gen Apple Silicon GPUs are capable of doing the computation fast enough (probably not tbh) but it would be great to get it working so folks can at least try it out. I tried changing all the mentions of cuda in the project to mps, but Iโ€™m getting an error in TensorScript which suggests some changes need to be made to the model to not assume CUDA. Is there a way to fix/patch this?

NotImplementedError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__/diffusers/models/unet_2d_condition/___torch_mangle_4939.py", line 44, in forward
    _4 = ops.prim.NumToTensor(torch.size(sample, 0))
    timesteps = torch.expand(timestep, [int(_4)])
    input0 = torch.to((time_proj).forward(timesteps, ), 5)
                       ~~~~~~~~~~~~~~~~~~ <--- HERE
    _5 = (time_embedding).forward(input0, )
    _6 = (conv_in).forward(sample, )
  File "code/__torch__/diffusers/models/embeddings/___torch_mangle_4232.py", line 8, in forward
  def forward(self: __torch__.diffusers.models.embeddings.___torch_mangle_4232.Timesteps,
    timesteps: Tensor) -> Tensor:
    _0 = torch.arange(0, 160, dtype=6, layout=None, device=torch.device("cuda:0"), pin_memory=False)
         ~~~~~~~~~~~~ <--- HERE
    exponent = torch.mul(_0, CONSTANTS.c0)
    exponent0 = torch.div(exponent, CONSTANTS.c1)

Out of 4GiB VRAM memory at run server

Installation Guide for Riffusion App & Inference Server on Windows. After command python -m riffusion.server --port 3013 --host 127.0.0.1 :

โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Traceback (most recent call last) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ C:\ProgramData\Anaconda3\envs\riffusion-inference\lib\runpy.py:197 in _run_module_as_main โ”‚
โ”‚ โ”‚
โ”‚ 194 โ”‚ main_globals = sys.modules["main"].dict โ”‚
โ”‚ 195 โ”‚ if alter_argv: โ”‚
โ”‚ 196 โ”‚ โ”‚ sys.argv[0] = mod_spec.origin โ”‚
โ”‚ โฑ 197 โ”‚ return _run_code(code, main_globals, None, โ”‚
โ”‚ 198 โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ "main", mod_spec) โ”‚
โ”‚ 199 โ”‚
โ”‚ 200 def run_module(mod_name, init_globals=None, โ”‚
โ”‚ โ”‚
โ”‚ C:\ProgramData\Anaconda3\envs\riffusion-inference\lib\runpy.py:87 in _run_code โ”‚
โ”‚ โ”‚
โ”‚ 84 โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ loader = loader, โ”‚
โ”‚ 85 โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ package = pkg_name, โ”‚
โ”‚ 86 โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ spec = mod_spec) โ”‚
โ”‚ โฑ 87 โ”‚ exec(code, run_globals) โ”‚
โ”‚ 88 โ”‚ return run_globals โ”‚
โ”‚ 89 โ”‚
โ”‚ 90 def _run_module_code(code, init_globals=None, โ”‚
โ”‚ โ”‚
โ”‚ C:\TheAiWork\Riffusion\riffusion-inference\riffusion\server.py:189 in โ”‚
โ”‚ โ”‚
โ”‚ 186 if name == "main": โ”‚
โ”‚ 187 โ”‚ import argh โ”‚
โ”‚ 188 โ”‚ โ”‚
โ”‚ โฑ 189 โ”‚ argh.dispatch_command(run_app) โ”‚
โ”‚ 190 โ”‚
โ”‚ โ”‚
โ”‚ C:\ProgramData\Anaconda3\envs\riffusion-inference\lib\site-packages\argh\dispatching.py:306 in โ”‚
โ”‚ dispatch_command โ”‚
โ”‚ โ”‚
โ”‚ 303 โ”‚ """ โ”‚
โ”‚ 304 โ”‚ parser = argparse.ArgumentParser(formatter_class=PARSER_FORMATTER) โ”‚
โ”‚ 305 โ”‚ set_default_command(parser, function) โ”‚
โ”‚ โฑ 306 โ”‚ dispatch(parser, *args, **kwargs) โ”‚
โ”‚ 307 โ”‚
โ”‚ 308 โ”‚
โ”‚ 309 def dispatch_commands(functions, *args, **kwargs): โ”‚
โ”‚ โ”‚
โ”‚ C:\ProgramData\Anaconda3\envs\riffusion-inference\lib\site-packages\argh\dispatching.py:174 in โ”‚
โ”‚ dispatch โ”‚
โ”‚ โ”‚
โ”‚ 171 โ”‚ โ”‚ # normally this is stdout; can be any file โ”‚
โ”‚ 172 โ”‚ โ”‚ f = output_file โ”‚
โ”‚ 173 โ”‚ โ”‚
โ”‚ โฑ 174 โ”‚ for line in lines: โ”‚
โ”‚ 175 โ”‚ โ”‚ # print the line as soon as it is generated to ensure that it is โ”‚
โ”‚ 176 โ”‚ โ”‚ # displayed to the user before anything else happens, e.g. โ”‚
โ”‚ 177 โ”‚ โ”‚ # raw_input() is called โ”‚
โ”‚ โ”‚
โ”‚ C:\ProgramData\Anaconda3\envs\riffusion-inference\lib\site-packages\argh\dispatching.py:277 in โ”‚
โ”‚ _execute_command โ”‚
โ”‚ โ”‚
โ”‚ 274 โ”‚ โ”‚
โ”‚ 275 โ”‚ try: โ”‚
โ”‚ 276 โ”‚ โ”‚ result = _call() โ”‚
โ”‚ โฑ 277 โ”‚ โ”‚ for line in result: โ”‚
โ”‚ 278 โ”‚ โ”‚ โ”‚ yield line โ”‚
โ”‚ 279 โ”‚ except tuple(wrappable_exceptions) as e: โ”‚
โ”‚ 280 โ”‚ โ”‚ processor = getattr(function, ATTR_WRAPPED_EXCEPTIONS_PROCESSOR, โ”‚
โ”‚ โ”‚
โ”‚ C:\ProgramData\Anaconda3\envs\riffusion-inference\lib\site-packages\argh\dispatching.py:260 in โ”‚
โ”‚ _call โ”‚
โ”‚ โ”‚
โ”‚ 257 โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ continue โ”‚
โ”‚ 258 โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ keywords[k] = getattr(namespace_obj, k) โ”‚
โ”‚ 259 โ”‚ โ”‚ โ”‚ โ”‚
โ”‚ โฑ 260 โ”‚ โ”‚ โ”‚ result = function(*positional, **keywords) โ”‚
โ”‚ 261 โ”‚ โ”‚ โ”‚
โ”‚ 262 โ”‚ โ”‚ # Yield the results โ”‚
โ”‚ 263 โ”‚ โ”‚ if isinstance(result, (GeneratorType, list, tuple)): โ”‚
โ”‚ โ”‚
โ”‚ C:\TheAiWork\Riffusion\riffusion-inference\riffusion\server.py:55 in run_app โ”‚
โ”‚ โ”‚
โ”‚ 52 โ”‚ """ โ”‚
โ”‚ 53 โ”‚ # Initialize the model โ”‚
โ”‚ 54 โ”‚ global PIPELINE โ”‚
โ”‚ โฑ 55 โ”‚ PIPELINE = RiffusionPipeline.load_checkpoint( โ”‚
โ”‚ 56 โ”‚ โ”‚ checkpoint=checkpoint, โ”‚
โ”‚ 57 โ”‚ โ”‚ use_traced_unet=not no_traced_unet, โ”‚
โ”‚ 58 โ”‚ โ”‚ device=device, โ”‚
โ”‚ โ”‚
โ”‚ C:\TheAiWork\Riffusion\riffusion-inference\riffusion\riffusion_pipeline.py:109 in โ”‚
โ”‚ load_checkpoint โ”‚
โ”‚ โ”‚
โ”‚ 106 โ”‚ โ”‚ โ”‚
โ”‚ 107 โ”‚ โ”‚ # Optionally load a traced unet โ”‚
โ”‚ 108 โ”‚ โ”‚ if checkpoint == "riffusion/riffusion-model-v1" and use_traced_unet: โ”‚
โ”‚ โฑ 109 โ”‚ โ”‚ โ”‚ traced_unet = cls.load_traced_unet( โ”‚
โ”‚ 110 โ”‚ โ”‚ โ”‚ โ”‚ checkpoint=checkpoint, โ”‚
โ”‚ 111 โ”‚ โ”‚ โ”‚ โ”‚ subfolder="unet_traced", โ”‚
โ”‚ 112 โ”‚ โ”‚ โ”‚ โ”‚ filename="unet_traced.pt", โ”‚
โ”‚ โ”‚
โ”‚ C:\TheAiWork\Riffusion\riffusion-inference\riffusion\riffusion_pipeline.py:153 in โ”‚
โ”‚ load_traced_unet โ”‚
โ”‚ โ”‚
โ”‚ 150 โ”‚ โ”‚ โ”‚ local_files_only=local_files_only, โ”‚
โ”‚ 151 โ”‚ โ”‚ โ”‚ cache_dir=cache_dir, โ”‚
โ”‚ 152 โ”‚ โ”‚ ) โ”‚
โ”‚ โฑ 153 โ”‚ โ”‚ unet_traced = torch.jit.load(unet_file) โ”‚
โ”‚ 154 โ”‚ โ”‚ โ”‚
โ”‚ 155 โ”‚ โ”‚ # Wrap it in a torch module โ”‚
โ”‚ 156 โ”‚ โ”‚ class TracedUNet(torch.nn.Module): โ”‚
โ”‚ โ”‚
โ”‚ C:\ProgramData\Anaconda3\envs\riffusion-inference\lib\site-packages\torch\jit_serialization.py: โ”‚
โ”‚ 162 in load โ”‚
โ”‚ โ”‚
โ”‚ 159 โ”‚ โ”‚
โ”‚ 160 โ”‚ cu = torch._C.CompilationUnit() โ”‚
โ”‚ 161 โ”‚ if isinstance(f, str) or isinstance(f, pathlib.Path): โ”‚
โ”‚ โฑ 162 โ”‚ โ”‚ cpp_module = torch._C.import_ir_module(cu, str(f), map_location, _extra_files) โ”‚
โ”‚ 163 โ”‚ else: โ”‚
โ”‚ 164 โ”‚ โ”‚ cpp_module = torch._C.import_ir_module_from_buffer( โ”‚
โ”‚ 165 โ”‚ โ”‚ โ”‚ cu, f.read(), map_location, _extra_files โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 4.00 GiB total capacity; 3.39 GiB already
allocated; 0 bytes free; 3.47 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting
max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

[Playground] RuntimeError: "cos_vml_cpu" not implemented for 'Half'

2023-01-07 01:34:56.643 Uncaught app exception
Traceback (most recent call last):
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/streamlit/runtime/caching/cache_utils.py", line 352, in get_or_create_cached_value
    result = cache.read_result(value_key)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/streamlit/runtime/caching/memo_decorator.py", line 461, in read_result
    raise e
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/streamlit/runtime/caching/memo_decorator.py", line 454, in read_result
    pickled_entry = self._read_from_mem_cache(key)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/streamlit/runtime/caching/memo_decorator.py", line 552, in _read_from_mem_cache
    raise CacheKeyNotFoundError("Key not found in mem cache")
streamlit.runtime.caching.cache_errors.CacheKeyNotFoundError: Key not found in mem cache

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/streamlit/runtime/caching/cache_utils.py", line 352, in get_or_create_cached_value
    result = cache.read_result(value_key)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/streamlit/runtime/caching/memo_decorator.py", line 461, in read_result
    raise e
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/streamlit/runtime/caching/memo_decorator.py", line 454, in read_result
    pickled_entry = self._read_from_mem_cache(key)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/streamlit/runtime/caching/memo_decorator.py", line 552, in _read_from_mem_cache
    raise CacheKeyNotFoundError("Key not found in mem cache")
streamlit.runtime.caching.cache_errors.CacheKeyNotFoundError: Key not found in mem cache

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/streamlit/runtime/caching/cache_utils.py", line 352, in get_or_create_cached_value
    result = cache.read_result(value_key)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/streamlit/runtime/caching/singleton_decorator.py", line 313, in read_result
    raise CacheKeyNotFoundError()
streamlit.runtime.caching.cache_errors.CacheKeyNotFoundError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 564, in _run_script
    exec(code, module.__dict__)
  File "/home/user/app/app.py", line 36, in 
    render_main()
  File "/home/user/app/app.py", line 33, in render_main
    render_func()
  File "/home/user/app/riffusion/riffusion/streamlit/pages/text_to_audio.py", line 65, in render_text_to_audio
    audio_bytes = streamlit_util.audio_bytes_from_spectrogram_image(
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/streamlit/runtime/caching/cache_utils.py", line 400, in wrapper
    return get_or_create_cached_value()
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/streamlit/runtime/caching/cache_utils.py", line 373, in get_or_create_cached_value
    return_value = func(*args, **kwargs)
  File "/home/user/app/riffusion/riffusion/streamlit/util.py", line 150, in audio_bytes_from_spectrogram_image
    segment = audio_segment_from_spectrogram_image(image=image, params=params, device=device)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/streamlit/runtime/caching/cache_utils.py", line 400, in wrapper
    return get_or_create_cached_value()
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/streamlit/runtime/caching/cache_utils.py", line 373, in get_or_create_cached_value
    return_value = func(*args, **kwargs)
  File "/home/user/app/riffusion/riffusion/streamlit/util.py", line 139, in audio_segment_from_spectrogram_image
    converter = spectrogram_image_converter(params=params, device=device)
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/streamlit/runtime/caching/cache_utils.py", line 400, in wrapper
    return get_or_create_cached_value()
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/streamlit/runtime/caching/cache_utils.py", line 373, in get_or_create_cached_value
    return_value = func(*args, **kwargs)
  File "/home/user/app/riffusion/riffusion/streamlit/util.py", line 120, in spectrogram_image_converter
    return SpectrogramImageConverter(params=params, device=device)
  File "/home/user/app/riffusion/riffusion/spectrogram_image_converter.py", line 21, in __init__
    self.converter = SpectrogramConverter(params=params, device=device)
  File "/home/user/app/riffusion/riffusion/spectrogram_converter.py", line 47, in __init__
    self.spectrogram_func = torchaudio.transforms.Spectrogram(
  File "/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/torchaudio/transforms/_transforms.py", line 83, in __init__
    window = window_fn(self.win_length) if wkwargs is None else window_fn(self.win_length, **wkwargs)
RuntimeError: "cos_vml_cpu" not implemented for 'Half'

[Playground] Batch `audio2audio` page

I've got some WIP code for a Streamlit page batching audio2audio using a random seed each time, just wanted to check if you'd be interested in a PR to add this as a page to the Playground?

rriffusion.cli image-to-audio fails with - ValueError: axes don't match array

Hello,

Thanks for this great work!

I am trying to run a basic script, taking a seed image provided in the repository:
python -m riffusion.cli image-to-audio --image ./seed_images/og_beat.png --audio clip.wav

however it fails with the following stack (I added some prints to further investigate):

<PIL.PngImagePlugin.PngImageFile image mode=P size=512x512 at 0x7FC22A2CC610> WARNING: Could not find spectrogram parameters in exif data. Using defaults. <PIL.Image.Image image mode=P size=512x512 at 0x7FC236F01C10> np.array(image): [[59 64 61 ... 59 64 61] [55 64 64 ... 55 64 64] [35 49 61 ... 35 49 61] ... [45 52 52 ... 45 52 52] [62 2 65 ... 62 2 65] [ 0 1 2 ... 0 1 2]] Traceback (most recent call last): File "/opt/conda/envs/riffusion/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/opt/conda/envs/riffusion/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/ubuntu/Sefi/riffusion/riffusion/cli.py", line 133, in <module> argh.dispatch_commands( File "/opt/conda/envs/riffusion/lib/python3.9/site-packages/argh/dispatching.py", line 328, in dispatch_commands dispatch(parser, *args, **kwargs) File "/opt/conda/envs/riffusion/lib/python3.9/site-packages/argh/dispatching.py", line 174, in dispatch for line in lines: File "/opt/conda/envs/riffusion/lib/python3.9/site-packages/argh/dispatching.py", line 277, in _execute_command for line in result: File "/opt/conda/envs/riffusion/lib/python3.9/site-packages/argh/dispatching.py", line 260, in _call result = function(*positional, **keywords) File "/home/ubuntu/Sefi/riffusion/riffusion/cli.py", line 88, in image_to_audio segment = converter.audio_from_spectrogram_image(pil_image) File "/home/ubuntu/Sefi/riffusion/riffusion/spectrogram_image_converter.py", line 79, in audio_from_spectrogram_image spectrogram = image_util.spectrogram_from_image( File "/home/ubuntu/Sefi/riffusion/riffusion/util/image_util.py", line 88, in spectrogram_from_image data = np.array(image).transpose(2, 0, 1) ValueError: axes don't match array

What am I missing?

[AudioSplitter] Throws `WinError 2` when run

Not sure if this is a missing dependency or a Windows specific issue, but when I attempt to run the splitter my local throws:

Traceback (most recent call last):
  File "C:\Users\***\anaconda3\envs\riffusion\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 565, in _run_script
    exec(code, module.__dict__)
  File "D:\StableDiffusion\riffusion\riffusion\streamlit\pages\split_audio.py", line 80, in <module>
    render_split_audio()
  File "D:\StableDiffusion\riffusion\riffusion\streamlit\pages\split_audio.py", line 57, in render_split_audio
    stems = split_audio(segment, device=device)
  File "D:\StableDiffusion\riffusion\riffusion\audio_splitter.py", line 51, in split_audio
    subprocess.run(
  File "C:\Users\***\anaconda3\envs\riffusion\lib\subprocess.py", line 505, in run
    with Popen(*popenargs, **kwargs) as process:
  File "C:\Users\***\anaconda3\envs\riffusion\lib\subprocess.py", line 951, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\Users\***\anaconda3\envs\riffusion\lib\subprocess.py", line 1420, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.