GithubHelp home page GithubHelp logo

stable-diffusion's Introduction

Development repository. Please see CompVis/stable-diffusion for the Stable Diffusion release.


Latent Diffusion Models

arXiv | BibTeX

High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach*, Andreas Blattmann*, Dominik Lorenz, Patrick Esser, BjΓΆrn Ommer
* equal contribution

News

April 2022

Requirements

A suitable conda environment named ldm can be created and activated with:

conda env create -f environment.yaml
conda activate ldm

Pretrained Models

A general list of all available checkpoints is available in via our model zoo. If you use any of these models in your work, we are always happy to receive a citation.

Text-to-Image

text2img-figure

Download the pre-trained weights (5.7GB)

mkdir -p models/ldm/text2img-large/
wget -O models/ldm/text2img-large/model.ckpt https://ommer-lab.com/files/latent-diffusion/nitro/txt2img-f8-large/model.ckpt

and sample with

python scripts/txt2img.py --prompt "a virus monster is playing guitar, oil on canvas" --ddim_eta 0.0 --n_samples 4 --n_iter 4 --scale 5.0  --ddim_steps 50

This will save each sample individually as well as a grid of size n_iter x n_samples at the specified output location (default: outputs/txt2img-samples). Quality, sampling speed and diversity are best controlled via the scale, ddim_steps and ddim_eta arguments. As a rule of thumb, higher values of scale produce better samples at the cost of a reduced output diversity.
Furthermore, increasing ddim_steps generally also gives higher quality samples, but returns are diminishing for values > 250. Fast sampling (i.e. low values of ddim_steps) while retaining good quality can be achieved by using --ddim_eta 0.0.
Faster sampling (i.e. even lower values of ddim_steps) while retaining good quality can be achieved by using --ddim_eta 0.0 and --plms (see Pseudo Numerical Methods for Diffusion Models on Manifolds).

Beyond 256Β²

For certain inputs, simply running the model in a convolutional fashion on larger features than it was trained on can sometimes result in interesting results. To try it out, tune the H and W arguments (which will be integer-divided by 8 in order to calculate the corresponding latent size), e.g. run

python scripts/txt2img.py --prompt "a sunset behind a mountain range, vector image" --ddim_eta 1.0 --n_samples 1 --n_iter 1 --H 384 --W 1024 --scale 5.0  

to create a sample of size 384x1024. Note, however, that controllability is reduced compared to the 256x256 setting.

The example below was generated using the above command. text2img-figure-conv

Inpainting

inpainting

Download the pre-trained weights

wget -O models/ldm/inpainting_big/last.ckpt https://heibox.uni-heidelberg.de/f/4d9ac7ea40c64582b7c9/?dl=1

and sample with

python scripts/inpaint.py --indir data/inpainting_examples/ --outdir outputs/inpainting_results

indir should contain images *.png and masks <image_fname>_mask.png like the examples provided in data/inpainting_examples.

Class-Conditional ImageNet

Available via a notebook . class-conditional

Unconditional Models

We also provide a script for sampling from unconditional LDMs (e.g. LSUN, FFHQ, ...). Start it via

CUDA_VISIBLE_DEVICES=<GPU_ID> python scripts/sample_diffusion.py -r models/ldm/<model_spec>/model.ckpt -l <logdir> -n <\#samples> --batch_size <batch_size> -c <\#ddim steps> -e <\#eta> 

Train your own LDMs

Data preparation

Faces

For downloading the CelebA-HQ and FFHQ datasets, proceed as described in the taming-transformers repository.

LSUN

The LSUN datasets can be conveniently downloaded via the script available here. We performed a custom split into training and validation images, and provide the corresponding filenames at https://ommer-lab.com/files/lsun.zip. After downloading, extract them to ./data/lsun. The beds/cats/churches subsets should also be placed/symlinked at ./data/lsun/bedrooms/./data/lsun/cats/./data/lsun/churches, respectively.

ImageNet

The code will try to download (through Academic Torrents) and prepare ImageNet the first time it is used. However, since ImageNet is quite large, this requires a lot of disk space and time. If you already have ImageNet on your disk, you can speed things up by putting the data into ${XDG_CACHE}/autoencoders/data/ILSVRC2012_{split}/data/ (which defaults to ~/.cache/autoencoders/data/ILSVRC2012_{split}/data/), where {split} is one of train/validation. It should have the following structure:

${XDG_CACHE}/autoencoders/data/ILSVRC2012_{split}/data/
β”œβ”€β”€ n01440764
β”‚   β”œβ”€β”€ n01440764_10026.JPEG
β”‚   β”œβ”€β”€ n01440764_10027.JPEG
β”‚   β”œβ”€β”€ ...
β”œβ”€β”€ n01443537
β”‚   β”œβ”€β”€ n01443537_10007.JPEG
β”‚   β”œβ”€β”€ n01443537_10014.JPEG
β”‚   β”œβ”€β”€ ...
β”œβ”€β”€ ...

If you haven't extracted the data, you can also place ILSVRC2012_img_train.tar/ILSVRC2012_img_val.tar (or symlinks to them) into ${XDG_CACHE}/autoencoders/data/ILSVRC2012_train/ / ${XDG_CACHE}/autoencoders/data/ILSVRC2012_validation/, which will then be extracted into above structure without downloading it again. Note that this will only happen if neither a folder ${XDG_CACHE}/autoencoders/data/ILSVRC2012_{split}/data/ nor a file ${XDG_CACHE}/autoencoders/data/ILSVRC2012_{split}/.ready exist. Remove them if you want to force running the dataset preparation again.

Model Training

Logs and checkpoints for trained models are saved to logs/<START_DATE_AND_TIME>_<config_spec>.

Training autoencoder models

Configs for training a KL-regularized autoencoder on ImageNet are provided at configs/autoencoder. Training can be started by running

CUDA_VISIBLE_DEVICES=<GPU_ID> python main.py --base configs/autoencoder/<config_spec>.yaml -t --gpus 0,    

where config_spec is one of {autoencoder_kl_8x8x64(f=32, d=64), autoencoder_kl_16x16x16(f=16, d=16), autoencoder_kl_32x32x4(f=8, d=4), autoencoder_kl_64x64x3(f=4, d=3)}.

For training VQ-regularized models, see the taming-transformers repository.

Training LDMs

In configs/latent-diffusion/ we provide configs for training LDMs on the LSUN-, CelebA-HQ, FFHQ and ImageNet datasets. Training can be started by running

CUDA_VISIBLE_DEVICES=<GPU_ID> python main.py --base configs/latent-diffusion/<config_spec>.yaml -t --gpus 0,

where <config_spec> is one of {celebahq-ldm-vq-4(f=4, VQ-reg. autoencoder, spatial size 64x64x3),ffhq-ldm-vq-4(f=4, VQ-reg. autoencoder, spatial size 64x64x3), lsun_bedrooms-ldm-vq-4(f=4, VQ-reg. autoencoder, spatial size 64x64x3), lsun_churches-ldm-vq-4(f=8, KL-reg. autoencoder, spatial size 32x32x4),cin-ldm-vq-8(f=8, VQ-reg. autoencoder, spatial size 32x32x4)}.

Model Zoo

Pretrained Autoencoding Models

rec2

All models were trained until convergence (no further substantial improvement in rFID).

Model rFID vs val train steps PSNR PSIM Link Comments
f=4, VQ (Z=8192, d=3) 0.58 533066 27.43 +/- 4.26 0.53 +/- 0.21 https://ommer-lab.com/files/latent-diffusion/vq-f4.zip
f=4, VQ (Z=8192, d=3) 1.06 658131 25.21 +/- 4.17 0.72 +/- 0.26 https://heibox.uni-heidelberg.de/f/9c6681f64bb94338a069/?dl=1 no attention
f=8, VQ (Z=16384, d=4) 1.14 971043 23.07 +/- 3.99 1.17 +/- 0.36 https://ommer-lab.com/files/latent-diffusion/vq-f8.zip
f=8, VQ (Z=256, d=4) 1.49 1608649 22.35 +/- 3.81 1.26 +/- 0.37 https://ommer-lab.com/files/latent-diffusion/vq-f8-n256.zip
f=16, VQ (Z=16384, d=8) 5.15 1101166 20.83 +/- 3.61 1.73 +/- 0.43 https://heibox.uni-heidelberg.de/f/0e42b04e2e904890a9b6/?dl=1
f=4, KL 0.27 176991 27.53 +/- 4.54 0.55 +/- 0.24 https://ommer-lab.com/files/latent-diffusion/kl-f4.zip
f=8, KL 0.90 246803 24.19 +/- 4.19 1.02 +/- 0.35 https://ommer-lab.com/files/latent-diffusion/kl-f8.zip
f=16, KL (d=16) 0.87 442998 24.08 +/- 4.22 1.07 +/- 0.36 https://ommer-lab.com/files/latent-diffusion/kl-f16.zip
f=32, KL (d=64) 2.04 406763 22.27 +/- 3.93 1.41 +/- 0.40 https://ommer-lab.com/files/latent-diffusion/kl-f32.zip

Get the models

Running the following script downloads und extracts all available pretrained autoencoding models.

bash scripts/download_first_stages.sh

The first stage models can then be found in models/first_stage_models/<model_spec>

Pretrained LDMs

Datset Task Model FID IS Prec Recall Link Comments
CelebA-HQ Unconditional Image Synthesis LDM-VQ-4 (200 DDIM steps, eta=0) 5.11 (5.11) 3.29 0.72 0.49 https://ommer-lab.com/files/latent-diffusion/celeba.zip
FFHQ Unconditional Image Synthesis LDM-VQ-4 (200 DDIM steps, eta=1) 4.98 (4.98) 4.50 (4.50) 0.73 0.50 https://ommer-lab.com/files/latent-diffusion/ffhq.zip
LSUN-Churches Unconditional Image Synthesis LDM-KL-8 (400 DDIM steps, eta=0) 4.02 (4.02) 2.72 0.64 0.52 https://ommer-lab.com/files/latent-diffusion/lsun_churches.zip
LSUN-Bedrooms Unconditional Image Synthesis LDM-VQ-4 (200 DDIM steps, eta=1) 2.95 (3.0) 2.22 (2.23) 0.66 0.48 https://ommer-lab.com/files/latent-diffusion/lsun_bedrooms.zip
ImageNet Class-conditional Image Synthesis LDM-VQ-8 (200 DDIM steps, eta=1) 7.77(7.76)* /15.82** 201.56(209.52)* /78.82** 0.84* / 0.65** 0.35* / 0.63** https://ommer-lab.com/files/latent-diffusion/cin.zip *: w/ guiding, classifier_scale 10 **: w/o guiding, scores in bracket calculated with script provided by ADM
Conceptual Captions Text-conditional Image Synthesis LDM-VQ-f4 (100 DDIM steps, eta=0) 16.79 13.89 N/A N/A https://ommer-lab.com/files/latent-diffusion/text2img.zip finetuned from LAION
OpenImages Super-resolution LDM-VQ-4 N/A N/A N/A N/A https://ommer-lab.com/files/latent-diffusion/sr_bsr.zip BSR image degradation
OpenImages Layout-to-Image Synthesis LDM-VQ-4 (200 DDIM steps, eta=0) 32.02 15.92 N/A N/A https://ommer-lab.com/files/latent-diffusion/layout2img_model.zip
Landscapes Semantic Image Synthesis LDM-VQ-4 N/A N/A N/A N/A https://ommer-lab.com/files/latent-diffusion/semantic_synthesis256.zip
Landscapes Semantic Image Synthesis LDM-VQ-4 N/A N/A N/A N/A https://ommer-lab.com/files/latent-diffusion/semantic_synthesis.zip finetuned on resolution 512x512

Get the models

The LDMs listed above can jointly be downloaded and extracted via

bash scripts/download_models.sh

The models can then be found in models/ldm/<model_spec>.

Coming Soon...

Comments

BibTeX

@misc{rombach2021highresolution,
      title={High-Resolution Image Synthesis with Latent Diffusion Models}, 
      author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and BjΓΆrn Ommer},
      year={2021},
      eprint={2112.10752},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

stable-diffusion's People

Contributors

ak391 avatar crowsonkb avatar pesser avatar rromb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

stable-diffusion's Issues

CUDA runs out of memory with lots of memory reserved

I'm trying to run the text-to-image model with the example but CUDA keeps running out of memory, despite it barely trying to allocate anything. It's trying to allocate 20MB when there's 7.3GB reserved. Is there any way to fix this? I've searched all over but I couldn't find a clear answer.

meet problems when text2img sampling

I am using this pretrained model for text2img sampling:
image

and i get a result like this:
image

when i use this pretrained model:
image

the result is normal, like this:
image

is there any suggestion?

Broken dependency(?) on a fresh installation

Trying to set the repo up and get it working but got the below error
Win 11

(ldm) D:\github\stable-diffusion>python scripts/txt2img.py --prompt "a virus monster is playing guitar, oil on canvas" --ddim_eta 0.0 --n_samples 4 --n_iter 4 --scale 5.0 --ddim_steps 50
Traceback (most recent call last):
File "scripts/txt2img.py", line 11, in
from pytorch_lightning import seed_everything
File "C:\Users\nikol\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning_init_.py", line 20, in
from pytorch_lightning import metrics # noqa: E402
File "C:\Users\nikol\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\metrics_init_.py", line 15, in
from pytorch_lightning.metrics.classification import ( # noqa: F401
File "C:\Users\nikol\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\metrics\classification_init_.py", line 14, in
from pytorch_lightning.metrics.classification.accuracy import Accuracy # noqa: F401
File "C:\Users\nikol\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\metrics\classification\accuracy.py", line 18, in
from pytorch_lightning.metrics.utils import deprecated_metrics, void
File "C:\Users\nikol\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\metrics\utils.py", line 22, in
from torchmetrics.utilities.data import get_num_classes as _get_num_classes
ImportError: cannot import name 'get_num_classes' from 'torchmetrics.utilities.data' (C:\Users\nikol\anaconda3\envs\ldm\lib\site-packages\torchmetrics\utilities\data.py)

Stable Diffusion Master Tutorials List - Including SDXL 0.9 - 43 Tutorials - Not An Issue Thread

Hello dear Patrick Esser, I hope you let this thread stay to help newcomers. This is not an issue thread. Thank you.

image Hits Twitter Follow Furkan GΓΆzΓΌkara

YouTube Channel Patreon Furkan GΓΆzΓΌkara LinkedIn

Expert-Level Tutorials on Stable Diffusion: Master Advanced Techniques and Strategies

Greetings everyone. I am Dr. Furkan GΓΆzΓΌkara. I am an Assistant Professor in Software Engineering department of a private university (have PhD in Computer Engineering). My professional programming skill is unfortunately C# not Python :)

My linkedin : https://www.linkedin.com/in/furkangozukara

Our channel address if you like to subscribe : https://www.youtube.com/@SECourses

Our discord to get more help : https://discord.com/servers/software-engineering-courses-secourses-772774097734074388

I am keeping this list up-to-date. I got upcoming new awesome video ideas. Trying to find time to do that.

I am open to any criticism you have. I am constantly trying to improve the quality of my tutorial guide videos. Please leave comments with both your suggestions and what you would like to see in future videos.

All videos have manually fixed subtitles and properly prepared video chapters. You can watch with these perfect subtitles or look for the chapters you are interested in.

Since my profession is teaching, I usually do not skip any of the important parts. Therefore, you may find my videos a little bit longer.

Playlist link on YouTube: Stable Diffusion Tutorials, Automatic1111 Web UI & Google Colab Guides, DreamBooth, Textual Inversion / Embedding, LoRA, AI Upscaling, Video to Anime

1.) Automatic1111 Web UI - PC - Free

How To Install Python, Setup Virtual Environment VENV, Set Default Python System Path & Install Git

image

2.) Automatic1111 Web UI - PC - Free

Easiest Way to Install & Run Stable Diffusion Web UI on PC by Using Open Source Automatic Installer

image

3.) Automatic1111 Web UI - PC - Free

How to use Stable Diffusion V2.1 and Different Models in the Web UI - SD 1.5 vs 2.1 vs Anything V3

image

4.) Automatic1111 Web UI - PC - Free

Zero To Hero Stable Diffusion DreamBooth Tutorial By Using Automatic1111 Web UI - Ultra Detailed

image

5.) Automatic1111 Web UI - PC - Free

DreamBooth Got Buffed - 22 January Update - Much Better Success Train Stable Diffusion Models Web UI

image

6.) Automatic1111 Web UI - PC - Free

How to Inject Your Trained Subject e.g. Your Face Into Any Custom Stable Diffusion Model By Web UI

image

7.) Automatic1111 Web UI - PC - Free

How To Do Stable Diffusion LORA Training By Using Web UI On Different Models - Tested SD 1.5, SD 2.1

image

8.) Automatic1111 Web UI - PC - Free

8 GB LoRA Training - Fix CUDA & xformers For DreamBooth and Textual Inversion in Automatic1111 SD UI

image

9.) Automatic1111 Web UI - PC - Free

How To Do Stable Diffusion Textual Inversion (TI) / Text Embeddings By Automatic1111 Web UI Tutorial

image

10.) Automatic1111 Web UI - PC - Free

How To Generate Stunning Epic Text By Stable Diffusion AI - No Photoshop - For Free - Depth-To-Image

image

11.) Python Code - Hugging Face Diffusers Script - PC - Free

How to Run and Convert Stable Diffusion Diffusers (.bin Weights) & Dreambooth Models to CKPT File

image

12.) NMKD Stable Diffusion GUI - Open Source - PC - Free

Forget Photoshop - How To Transform Images With Text Prompts using InstructPix2Pix Model in NMKD GUI

image

13.) Google Colab Free - Cloud - No PC Is Required

Transform Your Selfie into a Stunning AI Avatar with Stable Diffusion - Better than Lensa for Free

image

14.) Google Colab Free - Cloud - No PC Is Required

Stable Diffusion Google Colab, Continue, Directory, Transfer, Clone, Custom Models, CKPT SafeTensors

image

15.) Automatic1111 Web UI - PC - Free

Become A Stable Diffusion Prompt Master By Using DAAM - Attention Heatmap For Each Used Token - Word

image

16.) Python Script - Gradio Based - ControlNet - PC - Free

Transform Your Sketches into Masterpieces with Stable Diffusion ControlNet AI - How To Use Tutorial

image

17.) Automatic1111 Web UI - PC - Free

Sketches into Epic Art with 1 Click: A Guide to Stable Diffusion ControlNet in Automatic1111 Web UI

image

18.) RunPod - Automatic1111 Web UI - Cloud - Paid - No PC Is Required

Ultimate RunPod Tutorial For Stable Diffusion - Automatic1111 - Data Transfers, Extensions, CivitAI

image

19.) RunPod - Automatic1111 Web UI - Cloud - Paid - No PC Is Required

How To Install DreamBooth & Automatic1111 On RunPod & Latest Libraries - 2x Speed Up - cudDNN - CUDA

image

20.) Automatic1111 Web UI - PC - Free

Fantastic New ControlNet OpenPose Editor Extension & Image Mixing - Stable Diffusion Web UI Tutorial

image

21.) Automatic1111 Web UI - PC - Free

Automatic1111 Stable Diffusion DreamBooth Guide: Optimal Classification Images Count Comparison Test

image

22.) Automatic1111 Web UI - PC - Free

Epic Web UI DreamBooth Update - New Best Settings - 10 Stable Diffusion Training Compared on RunPods

image

23.) Automatic1111 Web UI - PC - Free

New Style Transfer Extension, ControlNet of Automatic1111 Stable Diffusion T2I-Adapter Color Control

image

24.) Automatic1111 Web UI - PC - Free

Generate Text Arts & Fantastic Logos By Using ControlNet Stable Diffusion Web UI For Free Tutorial

image

25.) Automatic1111 Web UI - PC - Free

How To Install New DREAMBOOTH & Torch 2 On Automatic1111 Web UI PC For Epic Performance Gains Guide

image

26.) Automatic1111 Web UI - PC - Free

Training Midjourney Level Style And Yourself Into The SD 1.5 Model via DreamBooth Stable Diffusion

image

27.) Automatic1111 Web UI - PC - Free

Video To Anime - Generate An EPIC Animation From Your Phone Recording By Using Stable Diffusion AI

image

28.) Python Script - Jupyter Based - PC - Free

Midjourney Level NEW Open Source Kandinsky 2.1 Beats Stable Diffusion - Installation And Usage Guide

image

29.) Automatic1111 Web UI - PC - Free

RTX 3090 vs RTX 3060 Ultimate Showdown for Stable Diffusion, ML, AI & Video Rendering Performance

image

30.) Kohya Web UI - Automatic1111 Web UI - PC - Free

Generate Studio Quality Realistic Photos By Kohya LoRA Stable Diffusion Training - Full Tutorial

image

31.) Kaggle NoteBook - Free

DeepFloyd IF By Stability AI - Is It Stable Diffusion XL or Version 3? We Review and Show How To Use

image

32.) Python Script - Automatic1111 Web UI - PC - Free

How To Find Best Stable Diffusion Generated Images By Using DeepFace AI - DreamBooth / LoRA Training

image

33.) Kohya Web UI - RunPod - Paid

How To Install And Use Kohya LoRA GUI / Web UI on RunPod IO With Stable Diffusion & Automatic1111

image

34.) PC - Google Colab - Free

Mind-Blowing Deepfake Tutorial: Turn Anyone into Your Favorite Movie Star! PC & Google Colab - roop

image

35.) Automatic1111 Web UI - PC - Free

Stable Diffusion Now Has The Photoshop Generative Fill Feature With ControlNet Extension - Tutorial

image

36.) Automatic1111 Web UI - PC - Free

Human Cropping Script & 4K+ Resolution Class / Reg Images For Stable Diffusion DreamBooth / LoRA

image

37.) Automatic1111 Web UI - PC - Free

Stable Diffusion 2 NEW Image Post Processing Scripts And Best Class / Regularization Images Datasets

image

38.) Automatic1111 Web UI - PC - Free

How To Use Roop DeepFake On RunPod Step By Step Tutorial With Custom Made Auto Installer Script

image

39.) RunPod - Automatic1111 Web UI - Cloud - Paid - No PC Is Required

How To Install DreamBooth & Automatic1111 On RunPod & Latest Libraries - 2x Speed Up - cudDNN - CUDA

image

40.) Automatic1111 Web UI - PC - Free + RunPod

Zero to Hero ControlNet Tutorial: Stable Diffusion Web UI Extension | Complete Feature Guide

image

41.) Automatic1111 Web UI - PC - Free + RunPod

The END of Photography - Use AI to Make Your Own Studio Photos, FREE Via DreamBooth Training

image

42.) Google Colab - Gradio - Free

How To Use Stable Diffusion XL (SDXL 0.9) On Google Colab For Free

image

43.) Local - PC - Free - Gradio

Stable Diffusion XL (SDXL) Locally On Your PC - 8GB VRAM - Easy Tutorial With Automatic Installer

image

KeyError: 'image' in stable-diffusion/ldm/models/diffusion/ddpm.py"

trying to run inpainting with the inpaint_big downloaded model. Changed the checkpoint and config path in inpainting-demo.
but this error appears:
(I think ddpm.py wants 512x512 RGBA image and streamlit gives 2x 512x512 RGB one image and one mask. But I have no clue.)

2022-08-05 11:42:22.357 Uncaught app exception
Traceback (most recent call last):
  File "/usr/local/envs/ldm/lib/python3.8/site-packages/streamlit/scriptrunner/script_runner.py", line 557, in _run_script
    exec(code, module.__dict__)
  File "/content/stable-diffusion/scripts/demo/inpainting.py", line 194, in <module>
    samples = sample(
  File "/content/stable-diffusion/scripts/demo/inpainting.py", line 38, in sample
    z, c, x, xrec, xc = self.get_input(batch, self.first_stage_key, bs=N, return_first_stage_outputs=True)
  File "/usr/local/envs/ldm/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/content/stable-diffusion/ldm/models/diffusion/ddpm.py", line 718, in get_input
    x = super().get_input(batch, k)
  File "/content/stable-diffusion/ldm/models/diffusion/ddpm.py", line 383, in get_input
    x = batch[k]
KeyError: 'image'
###
(512, 512, 3)
(512, 512, 3)
###

this is the colab notebook:
https://colab.research.google.com/drive/1iglh0P7CxYtJEf4N5K68RhNr9CJMzYa_?usp=sharing

ModuleNotFoundError: No module named 'ldm'

After following the directions, the txt2img.py script itself doesn't seem to recognize the LDM we created with the .yaml, though it exists in the .\anaconda3\envs. Do I perhaps have the wrong version of python or something? (I'm using 3.10) I don't see anything specified.

txt2img.py", line 15, in <module>
    from ldm.util import instantiate_from_config
ModuleNotFoundError: No module named 'ldm'

Also I found manually running 'pip install ldm' would install the wrong package, then it will ask for ldm.utils if I go this route. ref: CompVis/latent-diffusion#71 but this looks like it was using an online notebook

Missing Parenthesis?

Got this error:
Traceback (most recent call last):
File "/content/latent-diffusion/scripts/txt2img.py", line 10, in
from ldm.util import instantiate_from_config
File "/usr/local/lib/python3.7/dist-packages/ldm.py", line 20
print self.face_rec_model_path
^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print(self.face_rec_model_path)?

Dockerfile?

Hi,
just asking ... are you planning to make a Dockerfile? Looks like people having problems making the stuff run

Upgrade to Lightning 1.7

Hey! Needless to say incredible work with Stable Diffusion and latent diffusion in general.

I saw Stable Diffusion is using a old-ish version of PyTorch Lightning (1.4.2), I'm wondering if you'd like help upgrading to Lightning 1.7, happy to provide it. The idea would be to create a test, ensure there's (at least) parity on results and upgrade.

Here's a breakdown of what was released since 1.4.x just in case:

Extra note: Lighting 1.7 supports PyTorch 1.9+

About fine-tuning for inpainting

Huge thanks for your code contribution first!

I used your config file "v1-finetune-for-inpainting-laion-iaesthe.yaml" to fine-tune the model for text-conditioned inpainting. The dataset I used is this subset of the Liaon dataset.

It turns out the results finally become the naive inpainting (simply fills the missing region), and were no longer controlled by the text conditioning as the training proceeds (as shown below, the txt prompt is "a cat on the bench", but no cat appears). Maybe i miss some tricks, I wonder did you meet the same issue when you trained the model?

image

Thank you in advance :)

when I finetune sd model, and set trainer(precision=16), an error occurred

Traceback (most recent call last):
File "main.py", line 851, in
trainer.fit(model, data)
File "/opt/conda/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 553, in fit
self._run(model)
File "/opt/conda/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 918, in _run
self._dispatch()
File "/opt/conda/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 986, in _dispatch
self.accelerator.start_training(self)
File "/opt/conda/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 92, in start_training
self.training_type_plugin.start_training(trainer)
File "/opt/conda/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 161, in start_training
self._results = trainer.run_stage()
File "/opt/conda/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 996, in run_stage
return self._run_train()
File "/opt/conda/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1045, in _run_train
self.fit_loop.run()
File "/opt/conda/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 111, in run
self.advance(*args, **kwargs)
File "/opt/conda/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py", line 200, in advance
epoch_output = self.epoch_loop.run(train_dataloader)
File "/opt/conda/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 111, in run
self.advance(*args, **kwargs)
File "/opt/conda/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 130, in advance
batch_output = self.batch_loop.run(batch, self.iteration_count, self._dataloader_idx)
File "/opt/conda/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 101, in run
super().run(batch, batch_idx, dataloader_idx)
File "/opt/conda/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 111, in run
self.advance(*args, **kwargs)
File "/opt/conda/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 148, in advance
result = self._run_optimization(batch_idx, split_batch, opt_idx, optimizer)
File "/opt/conda/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 194, in _run_optimization
closure()
File "/opt/conda/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 236, in _training_step_and_backward_closure
result = self.training_step_and_backward(split_batch, batch_idx, opt_idx, optimizer, hiddens)
File "/opt/conda/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 549, in training_step_and_backward
self.backward(result, optimizer, opt_idx)
File "/opt/conda/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 590, in backward
result.closure_loss = self.trainer.accelerator.backward(result.closure_loss, optimizer, *args, **kwargs)
File "/opt/conda/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 276, in backward
self.precision_plugin.backward(self.lightning_module, closure_loss, *args, **kwargs)
File "/opt/conda/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 78, in backward
model.backward(closure_loss, optimizer, *args, **kwargs)
File "/opt/conda/envs/ldm/lib/python3.8/site-packages/pytorch_lightning/core/lightning.py", line 1481, in backward
loss.backward(*args, **kwargs)
File "/opt/conda/envs/ldm/lib/python3.8/site-packages/torch/_tensor.py", line 363, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/opt/conda/envs/ldm/lib/python3.8/site-packages/torch/autograd/init.py", line 173, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/opt/conda/envs/ldm/lib/python3.8/site-packages/torch/autograd/function.py", line 253, in apply
return user_fn(self, *args)
File "/root/data/juicefs_hz_cv_v3/11120102/project/generative-model/pesser-stable-diffusion/ldm/modules/diffusionmodules/util.py", line 138, in backward
output_tensors = ctx.run_function(*shallow_copies)
File "/root/data/juicefs_hz_cv_v3/11120102/project/generative-model/pesser-stable-diffusion/ldm/modules/attention.py", line 215, in _forward
x = self.attn1(self.norm1(x), context=context if self.disable_self_attn else None) + x
File "/opt/conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/normalization.py", line 189, in forward
return F.layer_norm(
File "/opt/conda/envs/ldm/lib/python3.8/site-packages/torch/nn/functional.py", line 2486, in layer_norm
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: expected scalar type Half but found Float

inpaint.sd won't work out of the box.

instructions are pretty clear, yet it doesn't work out of the box

  1. inpaint_sd.py been placed in root project folder to catch the subfolders and structure
  2. inpaint_sd.py, string 109, default="models/ldm/inpainting_big/last.ckpt",
  3. inpaint_sd.py, string 122, config="configs/stable-diffusion/inpainting/v1-finetune-for-inpainting-laion-iaesthe.yaml"
  4. stable-diffusion\configs\stable-diffusion\inpainting, v1-finetune-for-inpainting-laion-iaesthe.yaml: string 18 changed to ckpt_path: "models/ldm/inpainting_big/last.ckpt

so looks like everything is hooked right now, yet, when I run the script with:
python inpaint_sd.py --indir data/inpainting_examples/ --outdir outputs/inpainting_results

it gives out me these errors:
\stable-diffusion\inpaint_sd.py", line 124, in <module> model = instantiate_from_config(config.model)
\stable-diffusion\ldm\util.py", line 79, in instantiate_from_config return get_obj_from_str(config["target"])(**config.get("params", dict()))
\stable-diffusion\ldm\models\diffusion\ddpm.py", line 1627, in __init__ self.init_from_ckpt(ckpt_path, ignore_keys)
\stable-diffusion\ldm\models\diffusion\ddpm.py", line 1648, in init_from_ckpt new_entry[:, :self.keep_dims, ...] = sd[k]

RuntimeError: The expanded size of the tensor (4) must match the existing size (7) at non-singleton dimension 1. Target sizes: [320, 4, 3, 3]. Tensor sizes: [256, 7, 3, 3]

So, what I can do to fix the issue? why it even happening? πŸ€”

ImportError: cannot import name 'get_num_classes' from 'torchmetrics.utilities.data'

Getting the following error following the instructions

python scripts/txt2img.py --prompt "a virus monster is playing guitar, oil on canvas" --ddim_eta 0.0 --n_samples 4 --n_iter 4 --scale 5.0 --ddim_steps 50
Traceback (most recent call last):
File "scripts/txt2img.py", line 11, in
from pytorch_lightning import seed_everything
File "usr\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning_init_.py", line 20, in
from pytorch_lightning import metrics # noqa: E402
File "usr\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\metrics_init_.py", line 15, in
from pytorch_lightning.metrics.classification import ( # noqa: F401
File "usr\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\metrics\classification_init_.py", line 14, in
from pytorch_lightning.metrics.classification.accuracy import Accuracy # noqa: F401
File "usr\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\metrics\classification\accuracy.py", line 18, in
from pytorch_lightning.metrics.utils import deprecated_metrics, void
File "usr\anaconda3\envs\ldm\lib\site-packages\pytorch_lightning\metrics\utils.py", line 22, in
from torchmetrics.utilities.data import get_num_classes as _get_num_classes
ImportError: cannot import name 'get_num_classes' from 'torchmetrics.utilities.data' (usr\anaconda3\envs\ldm\lib\site-packages\torchmetrics\utilities\data.py)

AttributeError: partially initialized module 'torch' has no attribute 'Tensor' (most likely due to a circular import)

I tried running the test command and got this error. I wouldn't be surprised if I screwed something up. I uninstalled and reinstalled torch and tensor to no avail.

H:\stable>python scripts/txt2img.py --prompt "a virus monster is playing guitar, oil on canvas" --ddim_eta 0.0 --n_samples 4 --n_iter 4 --scale 5.0  --ddim_steps 50
Traceback (most recent call last):
  File "H:\stable\scripts\txt2img.py", line 2, in <module>
    import torch
  File "E:\anaconda3\lib\site-packages\torch\__init__.py", line 255, in <module>
    from .random import set_rng_state, get_rng_state, manual_seed, initial_seed, seed
  File "E:\anaconda3\lib\site-packages\torch\random.py", line 9, in <module>
    def set_rng_state(new_state: torch.Tensor) -> None:
AttributeError: partially initialized module 'torch' has no attribute 'Tensor' (most likely due to a circular import)

Repeated inpainting leads to saturated pixels

Repeated inpainting leads to saturated pixels. Quick and dirty example:

import subprocess
import os
import numpy as np
from PIL import Image, ImageDraw
import shutil

directory = lambda x: "./Diffusion/Diffusion_{}/".format(x)

for i in range(240):
	if i!=0:
		if os.path.exists(directory(i)):
			shutil.rmtree(directory(i))
for i in range(240):
	im = Image.new('RGB', (512, 512), (0, 0, 0))
	draw = ImageDraw.Draw(im)

	x = np.random.randint(512-128)
	y = np.random.randint(512-128)

	draw.rectangle([(x,y),(x+128,y+128)], fill=(255, 255, 255))
	im.save('{}Diffusion_mask.png'.format(directory(i)))

	os.mkdir(directory(i+1))
	subprocess.run('python scripts/inpaint.py --steps 20 --indir {} --outdir {}'.format(directory(i),directory(i+1)), shell=True)
	
	im = Image.open('{}Diffusion.png'.format(directory(i+1)))
	
	# pixels = 2
	# im = im.crop((pixels, pixels, 512-pixels, 512-pixels))
	# im = im.resize((512,512), resample=Image.BICUBIC, box=None, reducing_gap=None)
	# im.save('{}Diffusion.png'.format(directory(i+1)))
	im.save('./DiffusionOut/{0:06d}.png'.format(i))
	if i!=0:
		shutil.rmtree(directory(i))

Add folders/files:
./Diffusion/Diffusion_0/Diffusion.png
./DiffusionOut/

In scripts/inpainting changing

inpainted = inpainted.cpu().numpy().transpose(0,2,3,1)[0]*255

To

inpainted = np.round(inpainted.cpu().numpy().transpose(0,2,3,1)[0]*255)

Fixes the issue I think

Not sure why i can't pull SD through the batchweb ui.

Hi everyone.
i recently came across this weird error,
Not sure why i can't load SD through theuser batchweb ui.

I am runing 1.5


44
45

i can only run SD when i double click on the lanchpy and the python webui only but i get a long cmd message

66

Upscaling task

Hello,

Thanks for this great work.

I'm wondering if you could provide instructions on how to perform the Upscaling task?

Thanks!

ModuleNotFoundError: No module named 'torchtext.legacy'

Everything installed well except I get 2 errors. 1 upon install and another when I try to run it. install error is:
ERROR: File "setup.py" or "setup.cfg" not found. Directory cannot be installed in editable mode: /content
I'm trying to run this on a colab server in standalone mode from the command line.
Thanks for any help!
Traceback (most recent call last):
File "stable-diffusion/scripts/txt2img.py", line 11, in
from pytorch_lightning import seed_everything
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/init.py", line 20, in
from pytorch_lightning import metrics # noqa: E402
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/metrics/init.py", line 15, in
from pytorch_lightning.metrics.classification import ( # noqa: F401
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/metrics/classification/init.py", line 14, in
from pytorch_lightning.metrics.classification.accuracy import Accuracy # noqa: F401
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/metrics/classification/accuracy.py", line 18, in
from pytorch_lightning.metrics.utils import deprecated_metrics, void
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/metrics/utils.py", line 29, in
from pytorch_lightning.utilities import rank_zero_deprecation
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/utilities/init.py", line 18, in
from pytorch_lightning.utilities.apply_func import move_data_to_device # noqa: F401
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/utilities/apply_func.py", line 31, in
from torchtext.legacy.data import Batch
ModuleNotFoundError: No module named 'torchtext.legacy'

cannot import name 'autocast' from 'torch'

Got the conda env, installs, downloads and everything all working smoothly now, no error messages but upon running this pops up:

Traceback (most recent call last):
File "stable-diffusion/scripts/txt2img.py", line 12, in
from torch import autocast
ImportError: cannot import name 'autocast' from 'torch' (/usr/local/envs/ldm/lib/python3.8/site-packages/torch/init.py)

Text Conditioning Dropout

Thank you for this repo. It has more training related stuff, so I can try it on my own.
Can you please point me where 10 % text conditioning dropout is happening?
I'm afraid I will dropout twice if I dropout it on my own.
Thank you again. LDM is really awesome.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.