layerdiffusion / sd-forge-layerdiffuse Goto Github PK

View Code? Open in Web Editor NEW

3.8K 37.0 327.0 53 KB

[WIP] Layer Diffusion for WebUI (via Forge)

License: Apache License 2.0

Python 100.00%

sd-forge-layerdiffuse's Introduction

sd-forge-layerdiffuse

Transparent Image Layer Diffusion using Latent Transparency

This is a WIP extension for SD WebUI (via Forge) to generate transparent images and layers.

Updates

img2img is finished! See also here

Before You Start

Because many people may be curious about how the latent preview looks like during a transparent diffusion process, I recorded a video so that you can see it before you download the models and extensions:

screen_record.mp4

You can see that the native transparent diffusion can process transparent glass, semi-transparent glowing effects, etc, that are not possible with simple background removal methods. Native transparent diffusion also gives you detailed fur, hair, whiskers, and detailed structure like that skeleton.

Model Notes

Note that in this extension, all model downloads/selections are fully automatic. In fact most users can just skip this section.

Below models are released:

layer_xl_transparent_attn.safetensors This is a rank-256 LoRA to turn a SDXL into a transparent image generator. It will change the latent distribution of the model to a "transparent latent space" that can be decoded by the special VAE pipeline.
layer_xl_transparent_conv.safetensors This is an alternative model to turn your SDXL into a transparent image generator. This safetensors file includes an offset of all conv layers (and actually, all layers that are not q,k,v of any attention layers). These offsets can be merged to any XL model to change the latent distribution to transparent images. Because we excluded the offset training of any q,k,v layers, the prompt understanding of SDXL should be perfectly preserved. However, in practice, I find the layer_xl_transparent_attn.safetensors will lead to better results. This layer_xl_transparent_conv.safetensors is still included for some special use cases that needs special prompt understanding. Also, this model may introduce a strong style influence to the base model.
layer_xl_fg2ble.safetensors This is a safetensors file includes offsets to turn a SDXL into a layer generating model, that is conditioned on foregrounds, and generates blended compositions.
layer_xl_fgble2bg.safetensors This is a safetensors file includes offsets to turn a SDXL into a layer generating model, that is conditioned on foregrounds and blended compositions, and generates backgrounds.
layer_xl_bg2ble.safetensors This is a safetensors file includes offsets to turn a SDXL into a layer generating model, that is conditioned on backgrounds, and generates blended compositions.
layer_xl_bgble2fg.safetensors This is a safetensors file includes offsets to turn a SDXL into a layer generating model, that is conditioned on backgrounds and blended compositions, and generates foregrounds.
vae_transparent_encoder.safetensors This is an image encoder to extract a latent offset from pixel space. The offset can be added to latent images to help the diffusion of transparency. Note that in the paper we used a relatively heavy model with exactly same amount of parameters as the SD VAE. The released model is more light weighted, requires much less vram, and does not influence result quality in my tests.
vae_transparent_decoder.safetensors This is an image decoder that takes SD VAE outputs and latent image as inputs, and outputs a real PNG image. The model architecture is also more lightweight than the paper version to reduce VRAM requirement. I have made sure that the reduced parameters does not influence result quality.
layer_sd15_vae_transparent_encoder.safetensors Same as above VAE encoder, but fine-tuned for SD1.5.
layer_sd15_vae_transparent_decoder.safetensors Same as above VAE decoder, but fine-tuned for SD1.5.
layer_sd15_transparent_attn.safetensors This is a rank-256 LoRA to turn a SD1.5 into a transparent image generator. It will change the latent distribution of the model to a "transparent latent space" that can be decoded by the special VAE pipeline.
layer_sd15_joint.safetensors This model file allows for generating all layers together with SD1.5. It includes two rank-256 loras (foreground lora and background lora), and an attention sharing module to share attention between multiple diffusion processes on par. Note that different from paper, this model file includes an additional "blended lora", and it actually can generate three images together (fg, bg, and blended image). Generating blended images together with fg and bg is helpful for structural understanding in our very recent tests.
layer_sd15_fg2bg.safetensors This model file allows for generating background from foreground with SD1.5. It includes a rank-256 lora and an attention sharing module to share attention between multiple diffusion processes on par. This model file includes an additional "blended lora", and it actually can generate two images together (bg and blended image). Generating blended images together with bg is helpful for structural understanding in our very recent tests. Besides, to save VRAM, the fg is directly feed into all attention layers as control signal, rather than creating another diffusion pass.
layer_sd15_bg2fg.safetensors This model file allows for generating foreground from background with SD1.5. It includes a rank-256 lora and an attention sharing module to share attention between multiple diffusion processes on par. This model file includes an additional "blended lora", and it actually can generate two images together (fg and blended image). Generating blended images together with fg is helpful for structural understanding in our very recent tests. Besides, to save VRAM, the bg is directly feed into all attention layers as control signal, rather than creating another diffusion pass.

Below models may be released soon (if necessary):

SDXL models that can generate foreground and background together and SDXL's one step conditional model. (Note that all joint models for SD1.5 are already released) I put this model on hold because of these reasons: (1) the other released models can already achieve all functionalities and this model does not bring more functionalities. (2) the inference speed of this model is 3x slower than others and requires 4x more VRAM than other released model, and I am working on reducing the VRAM of this model and speed up the inference. (3) This model will involve more hyperparameters and if demanded, I will investigate the best practice for inference/training before release it.
The current background-conditioned foreground model for SDXL may be a bit too lightweight. I will probably release a heavier one with more parameters and different behaviors (see also the discussions later).
Because the difference between diffusers training and k-diffusion inference, I can observe some mystical problems like sometimes DPM++ will give artifacts but Euler A will fix it. I am looking into it and may provide some revised model that works better with all A1111 samplers.
Two-step foreground and background conditional models for SD1.5. (Note that one-step conditional/joint models are already released.)

Sanity Check

SDXL

We highly encourage you to go through the sanity check and get exactly same results (so that if any problem occurs, we will know if the problem is on our side).

The two used models are:

https://civitai.com/models/133005?modelVersionId=198530 Juggernaut XL V6 (note that the used one is V6, not v7 or v8 or V9)
https://civitai.com/models/261336?modelVersionId=295158 anima_pencil-XL 1.0.0 (note that the used one is 1.0.0, not 1.5.0)

We will first test transparent image generating. Set your extension to this:

an apple, high quality

Negative prompt: bad, ugly

Steps: 20, Sampler: DPM++ 2M SDE Karras, CFG scale: 5, Seed: 12345, Size: 1024x1024, Model hash: 1fe6c7ec54, Model: juggernautXL_version6Rundiffusion, layerdiffusion_enabled: True, layerdiffusion_method: Only Generate Transparent Image (Attention Injection), layerdiffusion_weight: 1, layerdiffusion_ending_step: 1, layerdiffusion_fg_image: False, layerdiffusion_bg_image: False, layerdiffusion_blend_image: False, layerdiffusion_resize_mode: Crop and Resize, Version: f0.0.17v1.8.0rc-latest-269-gef35383b

Make sure that you get this apple

woman, messy hair, high quality

Negative prompt: bad, ugly

Make sure that you get the woman with hair as messy as this

a cup made of glass, high quality

Negative prompt: bad, ugly

Make sure that you get this cup

glowing effect, book of magic, high quality

Negative prompt: bad, ugly

Steps: 20, Sampler: DPM++ 2M SDE Karras, CFG scale: 7, Seed: 12345, Size: 1024x1024, Model hash: 1fe6c7ec54, Model: juggernautXL_version6Rundiffusion, layerdiffusion_enabled: True, layerdiffusion_method: Only Generate Transparent Image (Attention Injection), layerdiffusion_weight: 1, layerdiffusion_ending_step: 1, layerdiffusion_fg_image: True, layerdiffusion_bg_image: False, layerdiffusion_blend_image: True, layerdiffusion_resize_mode: Crop and Resize, Version: f0.0.17v1.8.0rc-latest-269-gef35383b

make sure that you get this glowing book

OK then lets move on to a bit longer prompt:

(this prompt is from https://civitai.com/images/3160575)

photograph close up portrait of Female boxer training, serious, stoic cinematic 4k epic detailed 4k epic detailed photograph shot on kodak detailed bokeh cinematic hbo dark moody

Negative prompt: (worst quality, low quality, normal quality, lowres, low details, oversaturated, undersaturated, overexposed, underexposed, grayscale, bw, bad photo, bad photography, bad art:1.4), (watermark, signature, text font, username, error, logo, words, letters, digits, autograph, trademark, name:1.2), (blur, blurry, grainy), morbid, ugly, asymmetrical, mutated malformed, mutilated, poorly lit, bad shadow, draft, cropped, out of frame, cut off, censored, jpeg artifacts, out of focus, glitch, duplicate, (airbrushed, cartoon, anime, semi-realistic, cgi, render, blender, digital art, manga, amateur:1.3), (3D ,3D Game, 3D Game Scene, 3D Character:1.1), (bad hands, bad anatomy, bad body, bad face, bad teeth, bad arms, bad legs, deformities:1.3)

Steps: 20, Sampler: DPM++ 2M SDE Karras, CFG scale: 7, Seed: 12345, Size: 896x1152, Model hash: 1fe6c7ec54, Model: juggernautXL_version6Rundiffusion, layerdiffusion_enabled: True, layerdiffusion_method: Only Generate Transparent Image (Attention Injection), layerdiffusion_weight: 1, layerdiffusion_ending_step: 1, layerdiffusion_fg_image: False, layerdiffusion_bg_image: False, layerdiffusion_blend_image: False, layerdiffusion_resize_mode: Crop and Resize, Version: f0.0.17v1.8.0rc-latest-269-gef35383b

Anime model test:

girl in dress, high quality

Negative prompt: nsfw, bad, ugly, text, watermark

Steps: 20, Sampler: DPM++ 2M SDE Karras, CFG scale: 7, Seed: 12345, Size: 896x1152, Model hash: 7ed8da12d9, Model: animaPencilXL_v100, layerdiffusion_enabled: True, layerdiffusion_method: Only Generate Transparent Image (Attention Injection), layerdiffusion_weight: 1, layerdiffusion_ending_step: 1, layerdiffusion_fg_image: False, layerdiffusion_bg_image: False, layerdiffusion_blend_image: False, layerdiffusion_resize_mode: Crop and Resize, Version: f0.0.17v1.8.0rc-latest-269-gef35383b

(I am not very good at writing prompts in the AnimagineXL format, and perhaps you can get better results with better prompts)

SD1.5

The tested model is realisticVisionV51_v51VAE. We highly encourage you to go through the sanity check and get exactly same results (so that if any problem occurs, we will know if the problem is on our side).

an apple, 4k, high quality

Negative prompt: bad, ugly

Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 12345, Size: 512x512, Model hash: 15012c538f, Model: realisticVisionV51_v51VAE, layerdiffusion_enabled: True, layerdiffusion_method: (SD1.5) Only Generate Transparent Image (Attention Injection), layerdiffusion_weight: 1, layerdiffusion_ending_step: 1, layerdiffusion_fg_image: False, layerdiffusion_bg_image: False, layerdiffusion_blend_image: False, layerdiffusion_resize_mode: Crop and Resize, layerdiffusion_fg_additional_prompt: , layerdiffusion_bg_additional_prompt: , layerdiffusion_blend_additional_prompt: , Version: f0.0.17v1.8.0rc-latest-276-g29be1da7

Generating Foregrounds and Backgrounds Together (SD1.5)

This will allow you to generate all layers together in one single diffusion process.

Very important: Because this will generate 3 images together (the foreground, background, and blended image), your batchsize MUST be divided by 3. For example, you can use batch size 3 or 6 or 9 or 12 ... If you do not use batchsize number divided by 3, you will only get noise.

man walking, 4k, high quality

Negative prompt: bad, ugly

Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 12345, Size: 512x640, Model hash: 15012c538f, Model: realisticVisionV51_v51VAE, layerdiffusion_enabled: True, layerdiffusion_method: (SD1.5) Generate Everything Together, layerdiffusion_weight: 1, layerdiffusion_ending_step: 1, layerdiffusion_fg_image: False, layerdiffusion_bg_image: False, layerdiffusion_blend_image: False, layerdiffusion_resize_mode: Crop and Resize, layerdiffusion_fg_additional_prompt: , layerdiffusion_bg_additional_prompt: , layerdiffusion_blend_additional_prompt: , Version: f0.0.17v1.8.0rc-latest-276-g29be1da7

(Note that the third image is encoded/decoded by VAE and diffusion process so it may be different to the fg/bg. To get perfectly same fg/bg, you can blend the real bf and fg with any other software, or wait us to provide a simple UI for simple blending of some png elements.)

(this image is SD1.5 with very simple prompts and results can be much better with more prompt with SD15 quality tags, or with high-res fix coming soon)

Independent prompts for layers

In some cases, you may find that the background is corrupted by the global prompt. For example:

an apple on table, high quality, 4k

Negative prompt: nsfw, bad, ugly

Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 12345, Size: 512x512, Model hash: 15012c538f, Model: realisticVisionV51_v51VAE, layerdiffusion_enabled: True, layerdiffusion_method: (SD1.5) Generate Everything Together, layerdiffusion_weight: 1, layerdiffusion_ending_step: 1, layerdiffusion_fg_image: False, layerdiffusion_bg_image: False, layerdiffusion_blend_image: False, layerdiffusion_resize_mode: Crop and Resize, layerdiffusion_fg_additional_prompt: , layerdiffusion_bg_additional_prompt: , layerdiffusion_blend_additional_prompt: , Version: f0.0.17v1.8.0rc-latest-276-g29be1da7

(We somewhat do not want the apples in the background and only want foreground apples)

Then you can first remove all content part in the prompt

and then write them for different layers, like this

Then you will get

high quality, 4k

Negative prompt: nsfw, bad, ugly

Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 12345, Size: 512x512, Model hash: 15012c538f, Model: realisticVisionV51_v51VAE, layerdiffusion_enabled: True, layerdiffusion_method: (SD1.5) Generate Everything Together, layerdiffusion_weight: 1, layerdiffusion_ending_step: 1, layerdiffusion_fg_image: False, layerdiffusion_bg_image: False, layerdiffusion_blend_image: False, layerdiffusion_resize_mode: Crop and Resize, layerdiffusion_fg_additional_prompt: apple, layerdiffusion_bg_additional_prompt: floor in room, layerdiffusion_blend_additional_prompt: apple on floor in room, Version: f0.0.17v1.8.0rc-latest-276-g29be1da7

Some more examples

high quality, 4k Negative prompt: nsfw, bad, ugly Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 12345, Size: 512x640, Model hash: 15012c538f, Model: realisticVisionV51_v51VAE, layerdiffusion_enabled: True, layerdiffusion_method: (SD1.5) Generate Everything Together, layerdiffusion_weight: 1, layerdiffusion_ending_step: 1, layerdiffusion_fg_image: False, layerdiffusion_bg_image: False, layerdiffusion_blend_image: False, layerdiffusion_resize_mode: Crop and Resize, layerdiffusion_fg_additional_prompt: dog running, layerdiffusion_bg_additional_prompt: street, layerdiffusion_blend_additional_prompt: dog running in street, Version: f0.0.17v1.8.0rc-latest-276-g29be1da7

high quality, 4k Negative prompt: nsfw, bad, ugly Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 12345, Size: 512x640, Model hash: 15012c538f, Model: realisticVisionV51_v51VAE, layerdiffusion_enabled: True, layerdiffusion_method: (SD1.5) Generate Everything Together, layerdiffusion_weight: 1, layerdiffusion_ending_step: 1, layerdiffusion_fg_image: False, layerdiffusion_bg_image: False, layerdiffusion_blend_image: False, layerdiffusion_resize_mode: Crop and Resize, layerdiffusion_fg_additional_prompt: a man sitting, layerdiffusion_bg_additional_prompt: chair, layerdiffusion_blend_additional_prompt: a man sitting on chair, Version: f0.0.17v1.8.0rc-latest-276-g29be1da7

Background Condition (SD1.5, one step workflow)

First download this image:

In most cases, bg-to-fg does not need additional layer prompts. But you can add it if you wish

Very important: Because this will generate 2 images together (the foreground and blended image), your batchsize MUST be divided by 2. For example, you can use batch size 2 or 4 or 6 or 8 ... If you do not use batchsize number divided by 2, you will only get noise.

an old man sitting, high quality, 4k

Negative prompt: bad, ugly

Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 12345, Size: 512x640, Model hash: 15012c538f, Model: realisticVisionV51_v51VAE, layerdiffusion_enabled: True, layerdiffusion_method: (SD1.5) From Background to Foreground, layerdiffusion_weight: 1, layerdiffusion_ending_step: 1, layerdiffusion_fg_image: False, layerdiffusion_bg_image: True, layerdiffusion_blend_image: False, layerdiffusion_resize_mode: Crop and Resize, layerdiffusion_fg_additional_prompt: , layerdiffusion_bg_additional_prompt: , layerdiffusion_blend_additional_prompt: , Version: f0.0.17v1.8.0rc-latest-276-g29be1da7

Note that the second image is a visualization that will have color differences. To get perfectly same fg/bg, you can blend the real bg and fg with any other software, or wait us to provide a simple UI for simple blending of some png elements.

For example this is a real blending using photopea

Another example

Input:

Foreground Condition (SD1.5, one step workflow)

We first generate a cat

a cat running, high quality, 4k

Negative prompt: nsfw, bad, ugly

Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 12345, Size: 512x640, Model hash: 15012c538f, Model: realisticVisionV51_v51VAE, layerdiffusion_enabled: True, layerdiffusion_method: (SD1.5) Only Generate Transparent Image (Attention Injection), layerdiffusion_weight: 1, layerdiffusion_ending_step: 1, layerdiffusion_fg_image: True, layerdiffusion_bg_image: True, layerdiffusion_blend_image: False, layerdiffusion_resize_mode: Crop and Resize, layerdiffusion_fg_additional_prompt: , layerdiffusion_bg_additional_prompt: , layerdiffusion_blend_additional_prompt: , Version: f0.0.17v1.8.0rc-latest-276-g29be1da7

Then drag the real transparent foreground to UI

street, high quality, 4k

Negative prompt: nsfw, bad, ugly

Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 12345, Size: 512x640, Model hash: 15012c538f, Model: realisticVisionV51_v51VAE, layerdiffusion_enabled: True, layerdiffusion_method: (SD1.5) From Foreground to Background, layerdiffusion_weight: 1, layerdiffusion_ending_step: 1, layerdiffusion_fg_image: True, layerdiffusion_bg_image: True, layerdiffusion_blend_image: False, layerdiffusion_resize_mode: Crop and Resize, layerdiffusion_fg_additional_prompt: , layerdiffusion_bg_additional_prompt: , layerdiffusion_blend_additional_prompt: , Version: f0.0.17v1.8.0rc-latest-276-g29be1da7

Some More Complicated Examples for SD1.5

Lets travel a bit more.

First we get a man singing

a man singing, high quality, 4k

Negative prompt: bad, ugly

Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 12345, Size: 512x640, Model hash: 15012c538f, Model: realisticVisionV51_v51VAE, layerdiffusion_enabled: True, layerdiffusion_method: (SD1.5) Only Generate Transparent Image (Attention Injection), layerdiffusion_weight: 1, layerdiffusion_ending_step: 1, layerdiffusion_fg_image: True, layerdiffusion_bg_image: True, layerdiffusion_blend_image: False, layerdiffusion_resize_mode: Crop and Resize, layerdiffusion_fg_additional_prompt: , layerdiffusion_bg_additional_prompt: , layerdiffusion_blend_additional_prompt: , Version: f0.0.17v1.8.0rc-latest-276-g29be1da7

(then get a concert stage)

concert stage, high quality, 4k

Negative prompt: bad, ugly

Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 12345, Size: 512x640, Model hash: 15012c538f, Model: realisticVisionV51_v51VAE, layerdiffusion_enabled: True, layerdiffusion_method: (SD1.5) From Foreground to Background, layerdiffusion_weight: 1, layerdiffusion_ending_step: 1, layerdiffusion_fg_image: True, layerdiffusion_bg_image: True, layerdiffusion_blend_image: False, layerdiffusion_resize_mode: Crop and Resize, layerdiffusion_fg_additional_prompt: , layerdiffusion_bg_additional_prompt: , layerdiffusion_blend_additional_prompt: , Version: f0.0.17v1.8.0rc-latest-276-g29be1da7

then drag to background

(Then get a portrait of michael)

michael jackson, portrait, high quality, 4k

Negative prompt: full body, nsfw, bad, ugly

Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 12345, Size: 512x640, Model hash: 15012c538f, Model: realisticVisionV51_v51VAE, layerdiffusion_enabled: True, layerdiffusion_method: (SD1.5) From Background to Foreground, layerdiffusion_weight: 1, layerdiffusion_ending_step: 1, layerdiffusion_fg_image: True, layerdiffusion_bg_image: True, layerdiffusion_blend_image: False, layerdiffusion_resize_mode: Crop and Resize, layerdiffusion_fg_additional_prompt: , layerdiffusion_bg_additional_prompt: , layerdiffusion_blend_additional_prompt: , Version: f0.0.17v1.8.0rc-latest-276-g29be1da7

Background Condition (SDXL, two steps workflow)

First download this image:

then set the interface with

then set the parameters with

old man sitting, high quality

Negative prompt: bad, ugly

Steps: 20, Sampler: DPM++ 2M SDE Karras, CFG scale: 7, Seed: 12345, Size: 896x1152, Model hash: 1fe6c7ec54, Model: juggernautXL_version6Rundiffusion, layerdiffusion_enabled: True, layerdiffusion_method: From Background to Blending, layerdiffusion_weight: 1, layerdiffusion_ending_step: 1, layerdiffusion_fg_image: False, layerdiffusion_bg_image: True, layerdiffusion_blend_image: False, layerdiffusion_resize_mode: Crop and Resize, Version: f0.0.17v1.8.0rc-latest-269-gef35383b

Then set the interface with (you first change the mode and then drag the image from result to interface)

Then change the sampler to Euler A or UniPC or some other sampler that is not dpm (This is probably because of some difference between diffusers training script and webui's k-diffusion. I am still looking into this and may revise my training script and model very soon so that this step will be removed.)

FAQ:

OK. But how can I get a background image like this?

You can use the Foreground Condition to get a background like this. We will describe it in the next section.

Or you can use old inpainting tech to perform foreground removal on any image to get a background like this.

Wait. Why you generate it with two steps? Can I generate it with one pass?

Two steps allows for more flexible editing. We will release the one-step model soon for SDXL. Also, note that the one-step model for SD1.5 is already released.

Also you can see that the current model is about 680MB and in particular I think it is a bit too lightweight and will soon release a relatively heavier model for potential stronger structure understanding (but that is still under experiments).

Foreground Condition (SDXL, two steps workflow)

First we generate a dog

a dog sitting, high quality

Negative prompt: bad, ugly

Steps: 20, Sampler: DPM++ 2M SDE Karras, CFG scale: 7, Seed: 12345, Size: 896x1152, Model hash: 1fe6c7ec54, Model: juggernautXL_version6Rundiffusion, layerdiffusion_enabled: True, layerdiffusion_method: Only Generate Transparent Image (Attention Injection), layerdiffusion_weight: 1, layerdiffusion_ending_step: 1, layerdiffusion_fg_image: True, layerdiffusion_bg_image: False, layerdiffusion_blend_image: False, layerdiffusion_resize_mode: Crop and Resize, Version: f0.0.17v1.8.0rc-latest-269-gef35383b

then change to From Foreground to Blending and drag the transparent image to foreground input.

Note that you drag the real transparent image, not the visualization with checkboard background. Make sure tou see this

then do this

a dog sitting in room, high quality

Negative prompt: bad, ugly

Steps: 20, Sampler: DPM++ 2M SDE Karras, CFG scale: 7, Seed: 12345, Size: 896x1152, Model hash: 1fe6c7ec54, Model: juggernautXL_version6Rundiffusion, layerdiffusion_enabled: True, layerdiffusion_method: From Foreground to Blending, layerdiffusion_weight: 1, layerdiffusion_ending_step: 1, layerdiffusion_fg_image: True, layerdiffusion_bg_image: False, layerdiffusion_blend_image: False, layerdiffusion_resize_mode: Crop and Resize, Version: f0.0.17v1.8.0rc-latest-269-gef35383b

Then change mode, drag your image, so that

(Note that here I set stop at as 0.5 to get better results since I do not need the bg to be exactly same)

then do this

room, high quality

Negative prompt: bad, ugly

Steps: 20, Sampler: UniPC, CFG scale: 7, Seed: 12345, Size: 896x1152, Model hash: 1fe6c7ec54, Model: juggernautXL_version6Rundiffusion, layerdiffusion_enabled: True, layerdiffusion_method: From Foreground and Blending to Background, layerdiffusion_weight: 1, layerdiffusion_ending_step: 0.5, layerdiffusion_fg_image: True, layerdiffusion_bg_image: False, layerdiffusion_blend_image: True, layerdiffusion_resize_mode: Crop and Resize, Version: f0.0.17v1.8.0rc-latest-269-gef35383b

Note that this is a two-step workflow. We will release the one-step model soon for SDXL. Also, note that the one-step model for SD1.5 is already released.

sd-forge-layerdiffuse's People

Contributors

Stargazers

Watchers

Forkers

iseeyo camenduru wailovet allinbsv xiaohu2015 dalizardwizard polyphron catboxanon keyvanhardani navezjt tuhinmallick huchenlei hawkit leonleica kustomzone eseedo chenbaiyujason brandnewx winscat sikkgit sotokisehiro hbcbh1999 yangemail neuroradiology jags111 modernbaby ferranespigares mllejezebel girlieinkshamp shoesrox-instantix cozyme-reejoy racergigaeyeecht godzillayellowshortga lawrt-serenesilly c-chmnigma sawduroor-d binderpost-b banearnthegaseen steenfrfinaltalk crescenturer7 q-thegaro audriannawhester billite-jiggyough mellowkeeper-gament facubingessents scorpions11 selfmoff46socialwil spoiledic24 eculdsayinertle rubenbernardezhernande133z dailyerneheardoom axwerbit-y linecode tonyonst56 yishujia80 astanhope690 hoanganhdqtd gmyusuf supakyle hmd7ai eltociear leevaleeth mbakpur123 adeliavale cuppar ararat02 bewazdi viewebelbanditie padre33 nortonmovex amohamedaakhil splintersp rezabehnoud pink-black-bear danst9900 funflirty cartentslogicdot 0x-lovenloves colerim-bloomet trameski-encented rohitdash08 havealex bathula7036 roclive clarysf techthiyanes ysfyf kotamadelin notenumber1captail renjith84 itsharex joshdayax keyman9848 alxsbr2411 octopusonetwo ysfadlaa minmin2411 hutansilon sungaiglasis gunungtravia

sd-forge-layerdiffuse's Issues

"Additional Prompt" gives TypeError: 'NoneType' object is not iterable

Filling "Foreground Additional Prompt" or "Blended Additional Prompt" always gives TypeError: 'NoneType' object is not iterable.
It works only when leaving them empty.

Edit: It looks like the problem only occurs with longer prompts, basic 2-3 word prompts (positive and negative) work.
Edit2: over 150 symbols prompts don't work.

Python crashes

I was using Macbook pro with M2 chip and got the following error and python also crashed:

I was using the following settings:

Can anyone help?

layerdiffusion doesn't show up

I just installed it but can't see any section for layerdiffusion on a1111, what's wrong?

Running on M1 crashes the UI

I run the sanity check but it crashes when loading the UNet1024.
Does it not yet work with Apple?

Total progress: 100%|████████████████████████████████████████| 20/20 [01:00<00:00, 3.01s/it]
[Layer Diffusion] LayerMethod.FG_ONLY_ATTN███████████████████| 20/20 [01:00<00:00, 2.41s/it]
To load target model SDXL
Begin to load 1 model
Moving model(s) has taken 31.84 seconds
100%|████████████████████████████████████████████████████████| 20/20 [00:58<00:00, 2.92s/it]
To load target model AutoencoderKL███████████████████████████| 20/20 [00:53<00:00, 2.71s/it]
Begin to load 1 model
Reuse 1 loaded models
Moving model(s) has taken 10.24 seconds
To load target model UNet1024
Begin to load 1 model
Moving model(s) has taken 0.54 seconds
0%| | 0/8 [00:00<?, ?it/s]/AppleInternal/Library/BuildRoots/9941690d-bcf7-11ed-a645-863efbbaf80d/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:706: failed assertion `[MPSTemporaryNDArray initWithDevice:descriptor:] Error: NDArray dimension length > INT_MAX'
webui.sh: line 292: 9330 Abort trap: 6 "${python_cmd}" -u "${LAUNCH_SCRIPT}" "$@"
Benjamins-MBP:stable-diffusion-webui-forge benjaminbertram$ /opt/homebrew/Cellar/[email protected]/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '

No module named 'ldm_patched'

Hi layerdiffusion!!!
Have some troubles on Windows in A1111 1.8.0

Only the checkboard image is saved to output folder

How to automatically save the transparent image?

Question regarding foreground and background LoRA.

In Fig. 3 in your paper, you mentioned two LoRAs, foreground and background LoRA trained on top of the base model.

You also mentioned that when training the base diffusion model (a), all model weights are trainable.

However, it seems that the base diffusion model is a UNet with LoRA layers. If this is the foreground model, where is the "background LoRA"?

Higher batch size gives error

First of all - amazing job done with a fantastic extension! Appreciated very much!
Now to problem:

With Batch count 4 and batch size 1 everything is fine as desired:

Any value of Batch Size higher then 1 gives this error in console (Used 4 Batch Count and 2 Batch Size):

INFO:sd_dynamic_prompts.dynamic_prompting:Prompt matrix will create 8 images in a total of 4 batches.<00:00,  7.42it/s]
[Layer Diffusion] LayerMethod.FG_ONLY_ATTN
To load target model SDXL
Begin to load 1 model
Reuse 1 loaded models
[Memory Management] Current Free GPU Memory (MB) =  17564.9833984375
[Memory Management] Model Memory (MB) =  1821.875
[Memory Management] Minimal Inference Memory (MB) =  1024.0
[Memory Management] Estimated Remaining GPU Memory (MB) =  14719.1083984375
Moving model(s) has taken 0.44 seconds
  0%|                                                                                           | 0/20 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "I:\webui_forge_cu121_torch21\webui\modules_forge\main_thread.py", line 37, in loop
    task.work()
  File "I:\webui_forge_cu121_torch21\webui\modules_forge\main_thread.py", line 26, in work
    self.result = self.func(*self.args, **self.kwargs)
  File "I:\webui_forge_cu121_torch21\webui\modules\txt2img.py", line 111, in txt2img_function
    processed = processing.process_images(p)
  File "I:\webui_forge_cu121_torch21\webui\modules\processing.py", line 752, in process_images
    res = process_images_inner(p)
  File "I:\webui_forge_cu121_torch21\webui\modules\processing.py", line 921, in process_images_inner
    samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)
  File "I:\webui_forge_cu121_torch21\webui\modules\processing.py", line 1273, in sample
    samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x))
  File "I:\webui_forge_cu121_torch21\webui\modules\sd_samplers_kdiffusion.py", line 251, in sample
    samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
  File "I:\webui_forge_cu121_torch21\webui\modules\sd_samplers_common.py", line 263, in launch_sampling
    return func()
  File "I:\webui_forge_cu121_torch21\webui\modules\sd_samplers_kdiffusion.py", line 251, in <lambda>
    samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
  File "I:\webui_forge_cu121_torch21\system\python\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "I:\webui_forge_cu121_torch21\webui\repositories\k-diffusion\k_diffusion\sampling.py", line 626, in sample_dpmpp_2m_sde
    denoised = model(x, sigmas[i] * s_in, **extra_args)
  File "I:\webui_forge_cu121_torch21\system\python\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "I:\webui_forge_cu121_torch21\system\python\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "I:\webui_forge_cu121_torch21\webui\modules\sd_samplers_cfg_denoiser.py", line 182, in forward
    denoised = forge_sampler.forge_sample(self, denoiser_params=denoiser_params,
  File "I:\webui_forge_cu121_torch21\webui\modules_forge\forge_sampler.py", line 86, in forge_sample
    model, x, timestep, uncond, cond, cond_scale, model_options, seed = modifier(model, x, timestep, uncond, cond, cond_scale, model_options, seed)
  File "I:\webui_forge_cu121_torch21\webui\extensions\sd-forge-layerdiffusion\scripts\forge_layerdiffusion.py", line 233, in conditioning_modifier
    if timestep < sigma_end:
RuntimeError: Boolean value of Tensor with more than one value is ambiguous
Boolean value of Tensor with more than one value is ambiguous
*** Error completing request
*** Arguments: ('task(i89qzcfywj8dfle)', <gradio.routes.Request object at 0x0000021F894715D0>, 'girl in dress, high quality', 'nsfw, bad, ugly, text, watermark', [], 20, 'DPM++ 2M SDE Karras', 4, 2, 7, 1152, 896, False, 0.7, 2, 'Latent', 0, 0, 0, 'Use same checkpoint', 'Use same sampler', '', '', [], 0, False, '', 0.8, 12345, False, -1, 0, 0, 0, False, False, {'ad_model': 'face_yolov8n.pt', 'ad_model_classes': '', 'ad_prompt': '', 'ad_negative_prompt': '', 'ad_confidence': 0.3, 'ad_mask_k_largest': 0, 'ad_mask_min_ratio': 0, 'ad_mask_max_ratio': 1, 'ad_x_offset': 0, 'ad_y_offset': 0, 'ad_dilate_erode': 4, 'ad_mask_merge_invert': 'None', 'ad_mask_blur': 4, 'ad_denoising_strength': 0.4, 'ad_inpaint_only_masked': True, 'ad_inpaint_only_masked_padding': 32, 'ad_use_inpaint_width_height': False, 'ad_inpaint_width': 512, 'ad_inpaint_height': 512, 'ad_use_steps': False, 'ad_steps': 28, 'ad_use_cfg_scale': False, 'ad_cfg_scale': 7, 'ad_use_checkpoint': False, 'ad_checkpoint': 'Use same checkpoint', 'ad_use_vae': False, 'ad_vae': 'Use same VAE', 'ad_use_sampler': False, 'ad_sampler': 'DPM++ 2M Karras', 'ad_use_noise_multiplier': False, 'ad_noise_multiplier': 1, 'ad_use_clip_skip': False, 'ad_clip_skip': 1, 'ad_restore_face': False, 'ad_controlnet_model': 'None', 'ad_controlnet_module': 'None', 'ad_controlnet_weight': 1, 'ad_controlnet_guidance_start': 0, 'ad_controlnet_guidance_end': 1, 'is_api': ()}, {'ad_model': 'None', 'ad_model_classes': '', 'ad_prompt': '', 'ad_negative_prompt': '', 'ad_confidence': 0.3, 'ad_mask_k_largest': 0, 'ad_mask_min_ratio': 0, 'ad_mask_max_ratio': 1, 'ad_x_offset': 0, 'ad_y_offset': 0, 'ad_dilate_erode': 4, 'ad_mask_merge_invert': 'None', 'ad_mask_blur': 4, 'ad_denoising_strength': 0.4, 'ad_inpaint_only_masked': True, 'ad_inpaint_only_masked_padding': 32, 'ad_use_inpaint_width_height': False, 'ad_inpaint_width': 512, 'ad_inpaint_height': 512, 'ad_use_steps': False, 'ad_steps': 28, 'ad_use_cfg_scale': False, 'ad_cfg_scale': 7, 'ad_use_checkpoint': False, 'ad_checkpoint': 'Use same checkpoint', 'ad_use_vae': False, 'ad_vae': 'Use same VAE', 'ad_use_sampler': False, 'ad_sampler': 'DPM++ 2M Karras', 'ad_use_noise_multiplier': False, 'ad_noise_multiplier': 1, 'ad_use_clip_skip': False, 'ad_clip_skip': 1, 'ad_restore_face': False, 'ad_controlnet_model': 'None', 'ad_controlnet_module': 'None', 'ad_controlnet_weight': 1, 'ad_controlnet_guidance_start': 0, 'ad_controlnet_guidance_end': 1, 'is_api': ()}, True, False, 1, False, False, False, 1.1, 1.5, 100, 0.7, False, False, True, False, False, 0, 'Gustavosta/MagicPrompt-Stable-Diffusion', '', True, 'Only Generate Transparent Image (Attention Injection)', 1, 1, None, None, None, 'Crop and Resize', False, False, False, 'Matrix', 'Columns', 'Mask', 'Prompt', '1,1', '0.2', False, False, False, 'Attention', [False], '0', '0', '0.4', None, '0', '0', False, ControlNetUnit(input_mode=<InputMode.SIMPLE: 'simple'>, use_preview_as_input=False, batch_image_dir='', batch_mask_dir='', batch_input_gallery=[], batch_mask_gallery=[], generated_image=None, mask_image=None, hr_option='Both', enabled=False, module='None', model='None', weight=1, image=None, resize_mode='Crop and Resize', processor_res=-1, threshold_a=-1, threshold_b=-1, guidance_start=0, guidance_end=1, pixel_perfect=False, control_mode='Balanced', save_detected_map=True), ControlNetUnit(input_mode=<InputMode.SIMPLE: 'simple'>, use_preview_as_input=False, batch_image_dir='', batch_mask_dir='', batch_input_gallery=[], batch_mask_gallery=[], generated_image=None, mask_image=None, hr_option='Both', enabled=False, module='None', model='None', weight=1, image=None, resize_mode='Crop and Resize', processor_res=-1, threshold_a=-1, threshold_b=-1, guidance_start=0, guidance_end=1, pixel_perfect=False, control_mode='Balanced', save_detected_map=True), ControlNetUnit(input_mode=<InputMode.SIMPLE: 'simple'>, use_preview_as_input=False, batch_image_dir='', batch_mask_dir='', batch_input_gallery=[], batch_mask_gallery=[], generated_image=None, mask_image=None, hr_option='Both', enabled=False, module='None', model='None', weight=1, image=None, resize_mode='Crop and Resize', processor_res=-1, threshold_a=-1, threshold_b=-1, guidance_start=0, guidance_end=1, pixel_perfect=False, control_mode='Balanced', save_detected_map=True), False, 7, 1, 'Constant', 0, 'Constant', 0, 1, 'enable', 'MEAN', 'AD', 1, False, 1.01, 1.02, 0.99, 0.95, False, 0.5, 2, False, 256, 2, 0, False, False, 3, 2, 0, 0.35, True, 'bicubic', 'bicubic', False, 0, 'anisotropic', 0, 'reinhard', 100, 0, 'subtract', 0, 0, 'gaussian', 'add', 0, 100, 127, 0, 'hard_clamp', 5, 0, 'None', 'None', False, 'MultiDiffusion', 768, 768, 64, 4, False, False, False, False, False, 'positive', 'comma', 0, False, False, 'start', '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, False, False, False, 0, False, 8, True, True, 16, 'PNN', True, [], 30, '', 4, [], 1, '', '', '', '') {}
    Traceback (most recent call last):
      File "I:\webui_forge_cu121_torch21\webui\modules\call_queue.py", line 57, in f
        res = list(func(*args, **kwargs))
    TypeError: 'NoneType' object is not iterable

---

Working on latest updated Forge, w11, RTX4090

Feature Suggestion: Save/Output the actually transparent image, instead of the one with grid background

Currently, the extension saves the preview file with a grid background, see below:

But this is not actually transparent, it's just an example, to show that it's doing the transparency correctly.

The file I actually want to output is the one WITH transparency:

But this has to manually be downloaded, and then it's just saved as image.png, so I don't get my naming done by A1111/Forge.

I suggest to switch which file is saved first, or saved at all. The grid preview is completely useless to me at least. An option would be best I guess in case someone prefers the grid version.

Great work on this tool!

Can layerdiffusion be added in Fooocus in the future?

just wondering if this feature is going to be added to Fooocus anytime soon? I can't wait for it!

TypeError: 'NoneType' object is not iterable

A1111 forge newest version.
When generate,cmd bord shows this:

token_merging_ratio = 0.2
[Layer Diffusion] LayerMethod.FG_ONLY_ATTN
To load target model SDXL
Begin to load 1 model
Reuse 1 loaded models
[Memory Management] Current Free GPU Memory (MB) = 20344.2802734375
[Memory Management] Model Memory (MB) = 4210.9375
[Memory Management] Minimal Inference Memory (MB) = 1024.0
[Memory Management] Estimated Remaining GPU Memory (MB) = 15109.3427734375
Moving model(s) has taken 0.71 seconds
100%|██████████████████████████████████████████████████████████████████████████████████| 41/41 [00:07<00:00, 5.58it/s]
To load target model AutoencoderKL5it/s]
Begin to load 1 model
Reuse 1 loaded models
[Memory Management] Current Free GPU Memory (MB) = 16109.3330078125
[Memory Management] Model Memory (MB) = 0.0
[Memory Management] Minimal Inference Memory (MB) = 1024.0
[Memory Management] Estimated Remaining GPU Memory (MB) = 15085.3330078125
Moving model(s) has taken 0.01 seconds
0%| | 0/8 [00:00<?, ?it/s]
Traceback (most recent call last):
File "H:\webui_forge_cu121_torch21\webui\modules_forge\main_thread.py", line 37, in loop
task.work()
File "H:\webui_forge_cu121_torch21\webui\modules_forge\main_thread.py", line 26, in work
self.result = self.func(*self.args, **self.kwargs)
File "H:\webui_forge_cu121_torch21\webui\modules\txt2img.py", line 111, in txt2img_function
processed = processing.process_images(p)
File "H:\webui_forge_cu121_torch21\webui\modules\processing.py", line 752, in process_images
res = process_images_inner(p)
File "H:\webui_forge_cu121_torch21\webui\modules\processing.py", line 936, in process_images_inner
x_samples_ddim = decode_latent_batch(p.sd_model, samples_ddim, target_device=devices.cpu, check_for_nans=True)
File "H:\webui_forge_cu121_torch21\webui\modules\processing.py", line 638, in decode_latent_batch
sample = decode_first_stage(model, batch[i:i + 1])[0]
File "H:\webui_forge_cu121_torch21\webui\modules\sd_samplers_common.py", line 74, in decode_first_stage
return samples_to_images_tensor(x, approx_index, model)
File "H:\webui_forge_cu121_torch21\webui\modules\sd_samplers_common.py", line 57, in samples_to_images_tensor
x_sample = model.decode_first_stage(sample)
File "H:\webui_forge_cu121_torch21\system\python\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "H:\webui_forge_cu121_torch21\webui\modules_forge\forge_loader.py", line 239, in patched_decode_first_stage
sample = sd_model.forge_objects.vae.decode(sample).movedim(-1, 1) * 2.0 - 1.0
File "H:\webui_forge_cu121_torch21\webui\ldm_patched\modules\sd.py", line 288, in decode
return wrapper(self.decode_inner, samples_in)
File "H:\webui_forge_cu121_torch21\system\python\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "H:\webui_forge_cu121_torch21\webui\extensions\sd-forge-layerdiffusion\lib_layerdiffusion\models.py", line 249, in wrapper
y = self.estimate_augmented(pixel, latent)
File "H:\webui_forge_cu121_torch21\system\python\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "H:\webui_forge_cu121_torch21\webui\extensions\sd-forge-layerdiffusion\lib_layerdiffusion\models.py", line 224, in estimate_augmented
eps = self.estimate_single_pass(feed_pixel, feed_latent).clip(0, 1)
File "H:\webui_forge_cu121_torch21\system\python\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "H:\webui_forge_cu121_torch21\webui\extensions\sd-forge-layerdiffusion\lib_layerdiffusion\models.py", line 202, in estimate_single_pass
y = self.model.model(pixel, latent)
File "H:\webui_forge_cu121_torch21\system\python\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "H:\webui_forge_cu121_torch21\system\python\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "H:\webui_forge_cu121_torch21\webui\extensions\sd-forge-layerdiffusion\lib_layerdiffusion\models.py", line 174, in forward
sample = upsample_block(sample, res_samples, emb)
File "H:\webui_forge_cu121_torch21\system\python\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "H:\webui_forge_cu121_torch21\system\python\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "H:\webui_forge_cu121_torch21\system\python\lib\site-packages\diffusers\models\unet_2d_blocks.py", line 2181, in forward
hidden_states = torch.cat([hidden_states, res_hidden_states], dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 32 but got size 31 for tensor number 1 in the list.
Sizes of tensors must match except in dimension 1. Expected size 32 but got size 31 for tensor number 1 in the list.
*** Error completing request
*** Arguments: ('task(3sjzin41rxzjll2)', <gradio.routes.Request object at 0x000001F098EE0850>, 'score_9,score_8_up,score_7_up,best quality,masterpiece,4k,uncensored,prefect lighting,anime BREAK\nlora:KakudateKarinPonyXL:1,kkba,halo,very long hair,gradient hair BREAK ', 'source_comic,source_furry,source_pony,sketch,painting,monochrome,jpeg artifacts,extra digit,fewer digits,unaestheticXL2v10,', [], 41, 'Euler a', 1, 1, 7, 984, 552, False, 0.26, 2, '4x-AnimeSharp', 6, 0, 0, 'Use same checkpoint', 'Use same sampler', '', '', [], 0, False, '', 0.8, -1, False, -1, 0, 0, 0, False, False, False, False, 'base', False, False, {'ad_model': 'face_yolov8n.pt', 'ad_prompt': '', 'ad_negative_prompt': '', 'ad_confidence': 0.3, 'ad_mask_k_largest': 0, 'ad_mask_min_ratio': 0, 'ad_mask_max_ratio': 1, 'ad_x_offset': 0, 'ad_y_offset': 0, 'ad_dilate_erode': 4, 'ad_mask_merge_invert': 'None', 'ad_mask_blur': 4, 'ad_denoising_strength': 0.4, 'ad_inpaint_only_masked': True, 'ad_inpaint_only_masked_padding': 32, 'ad_use_inpaint_width_height': False, 'ad_inpaint_width': 512, 'ad_inpaint_height': 512, 'ad_use_steps': False, 'ad_steps': 28, 'ad_use_cfg_scale': False, 'ad_cfg_scale': 7, 'ad_use_checkpoint': False, 'ad_checkpoint': 'Use same checkpoint', 'ad_use_vae': False, 'ad_vae': 'Use same VAE', 'ad_use_sampler': False, 'ad_sampler': 'DPM++ 2M Karras', 'ad_use_noise_multiplier': False, 'ad_noise_multiplier': 1, 'ad_use_clip_skip': False, 'ad_clip_skip': 1, 'ad_restore_face': False, 'ad_controlnet_model': 'None', 'ad_controlnet_module': 'None', 'ad_controlnet_weight': 1, 'ad_controlnet_guidance_start': 0, 'ad_controlnet_guidance_end': 1, 'is_api': ()}, {'ad_model': 'None', 'ad_prompt': '', 'ad_negative_prompt': '', 'ad_confidence': 0.3, 'ad_mask_k_largest': 0, 'ad_mask_min_ratio': 0, 'ad_mask_max_ratio': 1, 'ad_x_offset': 0, 'ad_y_offset': 0, 'ad_dilate_erode': 4, 'ad_mask_merge_invert': 'None', 'ad_mask_blur': 4, 'ad_denoising_strength': 0.4, 'ad_inpaint_only_masked': True, 'ad_inpaint_only_masked_padding': 32, 'ad_use_inpaint_width_height': False, 'ad_inpaint_width': 512, 'ad_inpaint_height': 512, 'ad_use_steps': False, 'ad_steps': 28, 'ad_use_cfg_scale': False, 'ad_cfg_scale': 7, 'ad_use_checkpoint': False, 'ad_checkpoint': 'Use same checkpoint', 'ad_use_vae': False, 'ad_vae': 'Use same VAE', 'ad_use_sampler': False, 'ad_sampler': 'DPM++ 2M Karras', 'ad_use_noise_multiplier': False, 'ad_noise_multiplier': 1, 'ad_use_clip_skip': False, 'ad_clip_skip': 1, 'ad_restore_face': False, 'ad_controlnet_model': 'None', 'ad_controlnet_module': 'None', 'ad_controlnet_weight': 1, 'ad_controlnet_guidance_start': 0, 'ad_controlnet_guidance_end': 1, 'is_api': ()}, True, False, 1, False, False, False, 1.1, 1.5, 100, 0.7, False, False, True, False, False, 0, 'FredZhang7/anime-anything-promptgen-v2', '', True, 'Only Generate Transparent Image (Attention Injection)', 1, 1, None, None, None, 'Crop and Resize', False, ControlNetUnit(input_mode=<InputMode.SIMPLE: 'simple'>, use_preview_as_input=False, batch_image_dir='', batch_mask_dir='', batch_input_gallery=[], batch_mask_gallery=[], generated_image=None, mask_image=None, hr_option='Both', enabled=False, module='None', model='None', weight=1, image=None, resize_mode='Crop and Resize', processor_res=512, threshold_a=-1, threshold_b=-1, guidance_start=0, guidance_end=1, pixel_perfect=False, control_mode='Balanced', save_detected_map=True), ControlNetUnit(input_mode=<InputMode.SIMPLE: 'simple'>, use_preview_as_input=False, batch_image_dir='', batch_mask_dir='', batch_input_gallery=[], batch_mask_gallery=[], generated_image=None, mask_image=None, hr_option='Both', enabled=False, module='None', model='None', weight=1, image=None, resize_mode='Crop and Resize', processor_res=512, threshold_a=-1, threshold_b=-1, guidance_start=0, guidance_end=1, pixel_perfect=False, control_mode='Balanced', save_detected_map=True), ControlNetUnit(input_mode=<InputMode.SIMPLE: 'simple'>, use_preview_as_input=False, batch_image_dir='', batch_mask_dir='', batch_input_gallery=[], batch_mask_gallery=[], generated_image=None, mask_image=None, hr_option='Both', enabled=False, module='None', model='None', weight=1, image=None, resize_mode='Crop and Resize', processor_res=512, threshold_a=-1, threshold_b=-1, guidance_start=0, guidance_end=1, pixel_perfect=False, control_mode='Balanced', save_detected_map=True), ControlNetUnit(input_mode=<InputMode.SIMPLE: 'simple'>, use_preview_as_input=False, batch_image_dir='', batch_mask_dir='', batch_input_gallery=[], batch_mask_gallery=[], generated_image=None, mask_image=None, hr_option='Both', enabled=False, module='None', model='None', weight=1, image=None, resize_mode='Crop and Resize', processor_res=512, threshold_a=-1, threshold_b=-1, guidance_start=0, guidance_end=1, pixel_perfect=False, control_mode='Balanced', save_detected_map=True), False, 7, 1, 'Constant', 0, 'Constant', 0, 1, 'enable', 'MEAN', 'AD', 1, False, 1.01, 1.02, 0.99, 0.95, False, 0.5, 2, False, 256, 2, 0, False, False, 3, 2, 0, 0.35, True, 'bicubic', 'bicubic', False, 0, 'anisotropic', 0, 'reinhard', 100, 0, 'subtract', 0, 0, 'gaussian', 'add', 0, 100, 127, 0, 'hard_clamp', 5, 0, 'None', 'None', False, 'MultiDiffusion', 768, 768, 64, 4, False, False, False, False, False, 'positive', 'comma', 0, False, False, 'start', '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, False, False, False, 0, False) {}
Traceback (most recent call last):
File "H:\webui_forge_cu121_torch21\webui\modules\call_queue.py", line 57, in f
res = list(func(*args, **kwargs))
TypeError: 'NoneType' object is not iterable

In issue, i noticed the same error 'NoneType' about Higher batch size.
And that error can still generate images,mine just nothing but this.

Gray blurs still present in the contours.

What am I doing wrong? On light backgrounds the image looks good, but when placed on a dark background, white parts can be seen in some spots. Is this really the case, or am I doing something wrong?

woman, messy hair, high quality
Negative prompt: bad, ugly
Steps: 20, Sampler: DPM++ 2M SDE Karras, CFG scale: 5, Seed: 391673051, Size: 1024x1024, Model hash: 1fe6c7ec54, Model: juggernautXL_version6Rundiffusion, layerdiffusion_enabled: True, layerdiffusion_method: (SDXL) Only Generate Transparent Image (Attention Injection), layerdiffusion_weight: 1, layerdiffusion_ending_step: 1, layerdiffusion_fg_image: False, layerdiffusion_bg_image: False, layerdiffusion_blend_image: False, layerdiffusion_resize_mode: Crop and Resize, layerdiffusion_fg_additional_prompt: , layerdiffusion_bg_additional_prompt: , layerdiffusion_blend_additional_prompt: , Version: f0.0.17v1.8.0rc-latest-273-gb9705c58

Original image:

Why are the Generation times slower for SD1.5 over SDXL? SDXL checkpoint seems almost twice as fast(3s vs 6s). Am i missing something?

How to adjust the size of the generated foreground?

As shown in the figure, the generated foreground size is usually large. How to adjust the size of the generated foreground?

From background to Blending problem

First, thanks for the great work but I may have the following problem.
I downloaded the background image you provided and configured it with the same parameters as follows:
old man sitting, high quality
Negative prompt: bad, ugly
Steps: 20, Sampler: DPM++ 2M SDE Karras, CFG scale: 7, Seed: 12345, Size: 896x1152, Model hash: 1fe6c7ec54, Model: juggernautXL_version6Rundiffusion, layerdiffusion_enabled: True, layerdiffusion_method: From Background to Blending, layerdiffusion_weight: 1, layerdiffusion_ending_step: 1, layerdiffusion_fg_image: False, layerdiffusion_bg_image: True, layerdiffusion_blend_image: False, layerdiffusion_resize_mode: Crop and Resize, Version: f0.0.17v1.8.0rc-latest-276-g29be1da7

However, my inference results are as follows:

Also, there are some other failed cases:

(1)

parameters:
handlesome man sitting on the bench, high quality
Negative prompt: bad, ugly
Steps: 20, Sampler: DPM++ 2M SDE Karras, CFG scale: 7, Seed: 12345, Size: 896x1152, Model hash: 1fe6c7ec54, Model: juggernautXL_version6Rundiffusion, layerdiffusion_enabled: True, layerdiffusion_method: From Background to Blending, layerdiffusion_weight: 1, layerdiffusion_ending_step: 1, layerdiffusion_fg_image: False, layerdiffusion_bg_image: True, layerdiffusion_blend_image: False, layerdiffusion_resize_mode: Crop and Resize, Version: f0.0.17v1.8.0rc-latest-276-g29be1da7

(2)

input background:

parameters:
plants on table, high quality
Negative prompt: bad, ugly
Steps: 20, Sampler: DPM++ 2M SDE Karras, CFG scale: 7, Seed: 12345, Size: 896x1152, Model hash: 1fe6c7ec54, Model: juggernautXL_version6Rundiffusion, layerdiffusion_enabled: True, layerdiffusion_method: From Background to Blending, layerdiffusion_weight: 1, layerdiffusion_ending_step: 1, layerdiffusion_fg_image: False, layerdiffusion_bg_image: True, layerdiffusion_blend_image: False, layerdiffusion_resize_mode: Crop and Resize, Version: f0.0.17v1.8.0rc-latest-276-g29be1da7

The result always involves the generation of new objects in certain areas, while the geometric structures between layers are not well aligned. I would like to know if this is a problem with layer_xl_bg2ble.safetensors or if it's due to some details I might have overlooked personally.
Thanks.

Only grey background picture generated, cant get transparent backgroud version at the same time.

there should be 2 pictures when I generated picture, 1 gray background, 1 transparent background. but I get only 1 gray backgrounded picture. no error reported.

Questions about the architecture of latent transparency decoder

Love your work, thanks for sharing the code.

I got some questions about the decoder part, could you kindly provide me with some hints or guidance on this aspect?

Q1: In Appendix B page 22, “Then the decoder goes through 64 × 64 × 512 →128 × 128 × 512 → 256 × 256 × 256 → 512 × 512 × 128 → 512 × 512 × 3”. should the last output be 512 × 512 × 4?

Q2: The input of decoder is $(x_a, \hat{I})$, how about just input a $x_a$, since it has all the decoded information. If it did not work, then why.

Q3: Is the U-net decoder a must? Is it because that it is too hard to reconstruct $\hat{I}_c$, where the backgrounds information in the premultiplied $I$ is discarded but also required to reconstruct.

AttributeError: 'VAE' object has no attribute 'clone'

Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: f0.0.14v1.8.0rc-latest-184-g43c9e3b5
Commit hash: 43c9e3b5ce1642073c7a9684e36b45489eeb4a49
Launching Web UI with arguments:
Total VRAM 24564 MB, total RAM 16174 MB
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 4090 : native
VAE dtype: torch.bfloat16
Using pytorch cross attention
ControlNet preprocessor location: D:\BaiduNetdiskDownload\webui_forge_cu121_torch21\webui\models\ControlNetPreprocessor
Loading weights [f99f3dec38] from D:\BaiduNetdiskDownload\webui_forge_cu121_torch21\webui\models\Stable-diffusion\realisticStockPhoto_v20.safetensors
2024-03-03 13:36:14,199 - ControlNet - INFO - ControlNet UI callback registered.
Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch().
Startup time: 20.7s (prepare environment: 5.5s, import torch: 5.2s, import gradio: 1.6s, setup paths: 1.2s, initialize shared: 0.2s, other imports: 1.0s, load scripts: 3.1s, create ui: 0.8s, gradio launch: 1.8s).
model_type EPS
UNet ADM Dimension 2816
Using pytorch attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using pytorch attention in VAE
extra {'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_l.text_projection'}
loaded straight to GPU
To load target model SDXL
Begin to load 1 model
To load target model SDXLClipModel
Begin to load 1 model
Moving model(s) has taken 1.41 seconds
Model loaded in 16.3s (load weights from disk: 1.0s, forge instantiate config: 1.5s, forge load real models: 11.2s, calculate empty prompt: 2.5s).
[Layer Diffusion] LayerMethod.FG_ONLY_ATTN
*** Error running process_before_every_sampling: D:\BaiduNetdiskDownload\webui_forge_cu121_torch21\webui\extensions\sd-forge-layerdiffusion\scripts\forge_layerdiffusion.py
Traceback (most recent call last):
File "D:\BaiduNetdiskDownload\webui_forge_cu121_torch21\webui\modules\scripts.py", line 835, in process_before_every_sampling
script.process_before_every_sampling(p, *script_args, **kwargs)
File "D:\BaiduNetdiskDownload\webui_forge_cu121_torch21\webui\extensions\sd-forge-layerdiffusion\scripts\forge_layerdiffusion.py", line 130, in process_before_every_sampling
vae = p.sd_model.forge_objects.vae.clone()
AttributeError: 'VAE' object has no attribute 'clone'

安装后在WEBUI里看不到这个插件

WEBUI里看不到这个插件，终端提示错误代码为
Tag Autocomplete: Could not locate model-keyword extension, Lora trigger word completion will be limited to those added through the extra networks menu.
*** Error loading script: forge_layerdiffusion.py
Traceback (most recent call last):
File "D:\sd-webui-aki-v4.4\modules\scripts.py", line 527, in load_scripts
script_module = script_loading.load_module(scriptfile.path)
File "D:\sd-webui-aki-v4.4\modules\script_loading.py", line 10, in load_module
module_spec.loader.exec_module(module)
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "D:\sd-webui-aki-v4.4\extensions\sd-forge-layerdiffuse\scripts\forge_layerdiffusion.py", line 14, in
from ldm_patched.modules.utils import load_torch_file
ModuleNotFoundError: No module named 'ldm_patched'

AttributeError: 'VAE' object has no attribute 'clone' in forge_layerdiffusion.py

When I enable layer diffusion in sd-forge, it has no effect. The resulting image is not transparent and is identical to the one generated with layer diffusion disabled. The error occurs within the process_before_every_sampling() function in the forge_layerdiffusion.py script. Here's the relevant information:

[Layer Diffusion] LayerMethod.FG_ONLY_ATTN
*** Error running process_before_every_sampling: E:\stable-diffusion-webui-forge\extensions\sd-forge-layerdiffusion\scripts\forge_layerdiffusion.py
    Traceback (most recent call last):
      File "E:\stable-diffusion-webui-forge\modules\scripts.py", line 835, in process_before_every_sampling
        script.process_before_every_sampling(p, *script_args, **kwargs)
      File "E:\stable-diffusion-webui-forge\extensions\sd-forge-layerdiffusion\scripts\forge_layerdiffusion.py", line 130, in process_before_every_sampling
        vae = p.sd_model.forge_objects.vae.clone()
    AttributeError: 'VAE' object has no attribute 'clone'

Operating system: Windows 11

The images as inputs and dimensions only work with 1024x1024?

When I want to change to 512x512 or any multiple of 64, it doesn't work

mac m1 pro, python Error

[Intel Arc] [Windows] Exceptions both with SD 1.5 and SDXL generation

Hello! As the title states, extension does not seem to work correctly on Intel Arc GPUs. Specifically, when I try to run inference with SDXL image and extension enabled, I get the following exception:

File "c:\Forge\modules_forge\main_thread.py", line 37, in loop
    task.work()
  File "c:\Forge\modules_forge\main_thread.py", line 26, in work
    self.result = self.func(*self.args, **self.kwargs)
  File "C:\Forge\modules\txt2img.py", line 111, in txt2img_function
    processed = processing.process_images(p)
  File "C:\Forge\modules\processing.py", line 752, in process_images
    res = process_images_inner(p)
  File "C:\Forge\modules\processing.py", line 938, in process_images_inner
    x_samples_ddim = decode_latent_batch(p.sd_model, samples_ddim, target_device=devices.cpu, check_for_nans=True)
  File "C:\Forge\modules\processing.py", line 638, in decode_latent_batch
    sample = decode_first_stage(model, batch[i:i + 1])[0]
  File "C:\Forge\modules\sd_samplers_common.py", line 74, in decode_first_stage
    return samples_to_images_tensor(x, approx_index, model)
  File "C:\Forge\modules\sd_samplers_common.py", line 57, in samples_to_images_tensor
    x_sample = model.decode_first_stage(sample)
  File "c:\Forge\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "C:\Forge\modules_forge\forge_loader.py", line 239, in patched_decode_first_stage
    sample = sd_model.forge_objects.vae.decode(sample).movedim(-1, 1) * 2.0 - 1.0
  File "C:\Forge\ldm_patched\modules\sd.py", line 288, in decode
    return wrapper(self.decode_inner, samples_in)
  File "c:\Forge\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "C:\Forge\extensions\sd-forge-layerdiffuse\lib_layerdiffusion\models.py", line 256, in wrapper
    y = self.estimate_augmented(pixel[i:i+1], latent[i:i+1])
  File "c:\Forge\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "C:\Forge\extensions\sd-forge-layerdiffuse\lib_layerdiffusion\models.py", line 234, in estimate_augmented
    median = torch.median(result, dim=0).values
RuntimeError: Provided range is out of integer limits. Pass `-fno-sycl-id-queries-fit-in-int' to disable range check. -30 (PI_ERROR_INVALID_VALUE)

With SD 1.5, I run into the following (excerpt):

File "C:\Forge\ldm_patched\ldm\modules\attention.py", line 447, in forward
    return checkpoint(self._forward, (x, context, transformer_options), self.parameters(), self.checkpoint)
  File "C:\Forge\ldm_patched\ldm\modules\diffusionmodules\util.py", line 194, in checkpoint
    return func(*inputs)
  File "C:\Forge\ldm_patched\ldm\modules\attention.py", line 507, in _forward
    n = self.attn1(n, context=context_attn1, value=value_attn1)
  File "c:\Forge\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Forge\extensions\sd-forge-layerdiffuse\lib_layerdiffusion\attention_sharing.py", line 92, in forward
    framed_cond_mark = einops.rearrange(transformer_options['cond_mark'], '(b f) -> f b', f=self.frames).to(modified_hidden_states)
KeyError: 'cond_mark'

Forge runs on Windows natively, GPU in use is Intel Arc A770 16GB. Happy to provide any other details as necessary.

How to convert the generated img to real transparent img?

Thanks for the great work. It seems the generated img isnot real img with transparent background, instead a background with chessboard. I want to replace the corrsponding pixel by photoshop, but it failed. I want to know to how to implement to get a img with a transparent channel.

AttributeError: 'VAE' object has no attribute 'clone'

using the sanity check apple settings
when i run it i always get

extensions\sd-forge-layerdiffusion\scripts\forge_layerdiffusion.py", line 130, in process_before_every_sampling
        vae = p.sd_model.forge_objects.vae.clone()
   AttributeError: 'VAE' object has no attribute 'clone'

and a regular image is created

[m1 pro] How to fix the problem?

Moving model(s) has taken 0.91 seconds
100%|███████████████████████████████████████████| 20/20 [02:51<00:00, 8.59s/it]
To load target model AutoencoderKL██████████████| 20/20 [02:37<00:00, 8.22s/it]
Begin to load 1 model
Moving model(s) has taken 2.63 seconds
To load target model UNet1024
Begin to load 1 model
Moving model(s) has taken 0.24 seconds
100%|█████████████████████████████████████████████| 8/8 [00:14<00:00, 1.82s/it]
/AppleInternal/Library/BuildRoots/ce725a5f-c761-11ee-a4ec-b6ef2fd8d87b/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSNDArray/Kernels/MPSNDArraySort.mm:287: failed assertion `(null)" Axis = 4. This class only supports axis = 0, 1, 2, 3
'
./webui.sh: line 292: 2661 Abort trap: 6 "${python_cmd}" -u "${LAUNCH_SCRIPT}" "$@"
(base) mbp@MBP forge % /opt/homebrew/Cellar/[email protected]/3.10.11/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '

Why conv offsets can be applied to ANY SDXL models?

Hi, awesome work!
As you mentioned the safetensors you released, which are basically weight offsets (in my understanding), can be applied to any SDXL models.
I can understand that if the base model is the same(params and architecture), applying offsets can finally get the same model. However I tried layer_xl_transparent_conv.safetensors on my own fine-tuned model(unet model params are changed), and it still works pretty good.
Is there a theory behind this? Hope maybe you can share some insights.
Thanks!

Issues with image quality enhancing tags: Always draws background and makes the character semi-transparent

Everything works fine, but when I try to enhance the datails with these tags, a background is always drawn:

((( masterpiece, best quality, ultra-detailed, highres, 8k, cinematic style, (hyperdetailed)))), ((((1girl)))),

Prompt:

((( masterpiece, best quality, ultra-detailed, highres, 8k, cinematic style, (hyperdetailed)))), ((((1girl)))),
girl in dress, high quality

Steps: 30, Sampler: DPM++ 2M Karras, CFG scale: 5, Seed: 3462804341, Size: 1024x1024, Model hash: cf682fe9f7, Model: 1-ANIME_XL_AnimaPencilXL_v200, VAE hash: c2af720eca, VAE: sdxl_vae.safetensors, layerdiffusion_enabled: True, layerdiffusion_method: Only Generate Transparent Image (Attention Injection), layerdiffusion_weight: 1, layerdiffusion_ending_step: 1, layerdiffusion_fg_image: False, layerdiffusion_bg_image: False, layerdiffusion_blend_image: False, layerdiffusion_resize_mode: Crop and Resize, Version: f0.0.17v1.8.0rc-latest-273-gb9705c58

Question about the result of layerdiffusion

In the actual operation process, it was found that the current attention has a significant impact on the generation. Is there anything to pay attention to when filling in prompts? Or is there a sampler? Model? etc., to make the generated results as close as possible
The following example parameters remain consistent
use

not used

SD1.5

How complicated would it be to add support for SD1.5 models?

attention injection does not work with IPAdapter

See the issue I reported here:
huchenlei/ComfyUI-layerdiffuse#31

Just not sure if there's any future model can address this issue. Thanks!

[M2 Mac] forge crashed with SD1.5 model

Hello.
I tried to run the SD 1.5 model since it has been released and forge crashed with the following error.

crash log

[Layer Diffusion] LayerMethod.FG_ONLY_ATTN_SD15
To load target model BaseModel
Begin to load 1 model
Moving model(s) has taken 1.88 seconds
  4%|█████                                                                                                                         | 1/25 [00:02<01:00,  2.52s/it] progress:   0%|                                                                                                                      | 0/25 [00:00<?, ?it/s]
  8%|██████████                                                                                                                    | 2/25 [00:04<00:46,  2.02s/it]
 12%|███████████████                                                                                                               | 3/25 [00:05<00:41,  1.87s/it]
 16%|████████████████████▏                                                                                                         | 4/25 [00:07<00:37,  1.79s/it]
 20%|█████████████████████████▏                                                                                                    | 5/25 [00:09<00:34,  1.75s/it]
 24%|██████████████████████████████▏                                                                                               | 6/25 [00:10<00:32,  1.73s/it]
 28%|███████████████████████████████████▎                                                                                          | 7/25 [00:12<00:30,  1.70s/it]
 32%|████████████████████████████████████████▎                                                                                     | 8/25 [00:14<00:28,  1.68s/it]
 36%|█████████████████████████████████████████████▎                                                                                | 9/25 [00:15<00:26,  1.67s/it]
 40%|██████████████████████████████████████████████████                                                                           | 10/25 [00:17<00:25,  1.67s/it]
 44%|███████████████████████████████████████████████████████                                                                      | 11/25 [00:19<00:23,  1.66s/it]
 48%|████████████████████████████████████████████████████████████                                                                 | 12/25 [00:20<00:21,  1.67s/it]
 52%|█████████████████████████████████████████████████████████████████                                                            | 13/25 [00:22<00:19,  1.67s/it]
 56%|██████████████████████████████████████████████████████████████████████                                                       | 14/25 [00:24<00:18,  1.67s/it]
 60%|███████████████████████████████████████████████████████████████████████████                                                  | 15/25 [00:25<00:16,  1.67s/it]
 64%|██████████████████████████████████████████████████████████████████████████████ ██                                             | 16/25 [00:27<00:15,  1.68s/it]
 68%|█████████████████████████████████████████████████████████████████████████████████████                                        | 17/25 [00:29<00:13,  1.66s/it]
 72%|██████████████████████████████████████████████████████████████████████████████████████████                                   | 18/25 [00:30<00:11,  1.65s/it]
 76%|███████████████████████████████████████████████████████████████████████████████████████████████                              | 19/25 [00:32<00:09,  1.65s/it]
 80%|████████████████████████████████████████████████████████████████████████████████████████████████████                         | 20/25 [00:34<00:08,  1.65s/it]
 84%|█████████████████████████████████████████████████████████████████████████████████████████████████████████                    | 21/25 [00:35<00:06,  1.64s/it]
 88%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████               | 22/25 [00:37<00:04,  1.64s/it]
 92%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████          | 23/25 [00:38<00:03,  1.64s/it]
 96%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████     | 24/25 [00:40<00:01,  1.63s/it]
100%|██████████████████████████████████████████████████████████████████████████████100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:42<00:00,  1.69s/it]
To load target model AutoencoderKL
Begin to load 1 model
Moving model(s) has taken 4.26 seconds
To load target model UNet1024
Begin to load 1 model
Moving model(s) has taken 0.73 seconds
 12%|███████████████▉                                                                                                               | 1/8 [00:01<0 25%|███████████████████████████████▊                                                                                              38%|███████████████████████████████████████████████▋                                                              50%|███████████████████████████████████████████████████████████████▌                              62%|██████████████████████████████████████████████████████████████████████████████ 75%|██████████████████████████████████████████████████████████████████████████████ 88%|██████████████████████████████████████████████████████████████████████████████100%|██████████████████████████████████████████████████████████████████████████████100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:03<00:00,  2.07it/s]
/AppleInternal/Library/BuildRoots/8e85887f-758c-11ee-a904-2a65a1af8551/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSNDArray/Kernels/MPSNDArraySort.mm:283: failed assertion `(null)" Axis = 4. This class only supports axis = 0, 1, 2, 3

forge version: commit 29be1da7cf2b5dccfc70fbdd33eb35c56a31ffb7
PC: M2 Mac mini
OS: macOS Ventura (13.6.4)
LayerDiffuse parameters:
- Method: (SD1.5) Only Generate Transparent Image (Attention Injection)
- Weight: 1
- Stop At 1

Thank you!

how do i run it by diffusers

give some examples，please

From Background to Blending,How to apply masks

Otherwise, the location for generating content cannot be specified

Not clean in from foreground and blending to background

hair shows up
Why?

Will extensions compatible with a1111 be released？

And when will the extension be released?
thx

Poor quality in from foreground to blending in sdxl

Prompt: Landscape photography,highest resolution, 8K, realistic, detailed, dynamic, full detail, creative, awesome, perfect, unique, amazing, beautiful

Settings: Steps: 30, Sampler: Euler a, CFG scale: 5, Seed: 12345, Size: 832x1216, Model hash: 1fe6c7ec54, Model: juggernautXL_version6Rundiffusion, layerdiffusion_enabled: True, layerdiffusion_method: (SDXL) From Foreground to Blending, layerdiffusion_weight: 1, layerdiffusion_ending_step: 1, layerdiffusion_fg_image: True, layerdiffusion_bg_image: False, layerdiffusion_blend_image: False, layerdiffusion_resize_mode: Crop and Resize, layerdiffusion_fg_additional_prompt: , layerdiffusion_bg_additional_prompt: , layerdiffusion_blend_additional_prompt: , Version: f0.0.17v1.8.0rc-1.7.0
Input:

Result :

always get blurry result like this,is there anything wrong with my settings?

AttributeError: 'VAE' object has no attribute 'clone' Forge LayerDiffusion

Everything seems to be loaded fine, but I dont get the transparency. Everything seems updated from the extensions/updates tab.
Any help is appreciated. error:

*** Error running process_before_every_sampling: D:\Ai\stable-diffusion-webui-forge-main\extensions\sd-forge-layerdiffusion\scripts\forge_layerdiffusion.py
Traceback (most recent call last):
File "D:\Ai\stable-diffusion-webui-forge-main\modules\scripts.py", line 835, in process_before_every_sampling
script.process_before_every_sampling(p, *script_args, **kwargs)
File "D:\Ai\stable-diffusion-webui-forge-main\extensions\sd-forge-layerdiffusion\scripts\forge_layerdiffusion.py", line 130, in process_before_every_sampling
vae = p.sd_model.forge_objects.vae.clone()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'VAE' object has no attribute 'clone'

does not appear in my diffusion stable panel when installing it

I installed via https://github.com/layerdiffusion/sd-forge-layerdiffusion.git in automatic version 8 and it doesn't appear anywhere it is not compatible? or where do I see a correct installation guide.

Could you upload the transparent sample images

Like the apple could you upload the rest of the sample image transparent versions? I'm doing some of the "sanity check" images and besides the apple you only uploaded the checkerboard pattern examples. Would be useful for compairson sake.

Mine:

Yours:

the one that i did was a bit off, but still perfectly good. No rush btw you did great work and this will be immensivly useful to many people i imagine, can't wait for the full release!

Alpha output does not have ReActor results applied

First off, amazing work with this extension! I was very pleased to see this is also an always_on extension for easy API usage.

When I try using this in conjunction with the ReActor (face swap) extension, only the checkered image output has the face swap applied to it, while the alpha transparency output does not.

I'm using the latest versions of sdwebui-forge, sd-forge-layerdiffuse, and sd-webui-reactor.

Steps to reproduce:
-Launch sd-webui-forge
-Enable layer-diffuse with default settings
-Enable sd-webui-reactor with default settings, and add a face image.
-Generate

Diffusers

Will there be a version for diffusers?

Some of the "from foreground to blending" results are not good

Uploaded image

sofa in the living room, high quality
Negative prompt: bad, ugly
Steps: 20, Sampler: UniPC, CFG scale: 7, Seed: 3720371256, Size: 1024x1024, Model hash: 1fe6c7ec54, Model: juggernautXL_version6Rundiffusion, layerdiffusion_enabled: True, layerdiffusion_method: From Foreground to Blending, layerdiffusion_weight: 1, layerdiffusion_ending_step: 1, layerdiffusion_fg_image: True, layerdiffusion_bg_image: False, layerdiffusion_blend_image: False, layerdiffusion_resize_mode: Crop and Resize,

Notes

I'm using UniPC sampler as suggested.
The sofa in the cropped image wasn't produced by JuggernautXL; it's an actual photograph of a sofa from IKEA.

The resulting images display sofas stacked behind the original sofa. How can we improve the quality of these renders?

Thank you!

Extension generates only saves image with checkerboard

Extension generates only saves image with checkerboard. if i need transpatrent - i have to manualy download image from TEMP folder. How do i make it automatically

save generated transparent image?

Why are image values normalized to [0, 1] instead of [-1, 1]

https://github.com/layerdiffusion/sd-forge-layerdiffusion/blob/main/lib_layerdiffusion/utils.py#L7-L10
vae should use [-1, 1] to encode image

please provide the checkpoint for SD 1.5

This looks awesome for SDXL. I would like to try it for SD 1.5. Thank you so much!

Can LayerDiffusion be executed after ADetailer?

missing parts in from background and blending to foreground

hi, I'm trying the basic functions and noticed that the image generated by "from background and blending to foreground" missed a leg of the generated horse, is it normal or did I do anything wrong? Base model is Juggernaut XL V6

[feature request] usable with the API endpoint of forge

Not sure if this already works, haven't found it at the very least.
Would be cool to be able to use this with the API endpoint.

Use on Directml, "torch.median()" problem.

I was having an error on the line 'y = y.clip(0, 1).movedim(1, -1)' stating that there was only 1 dimension and expected [-1,0], so I tracked the problem with prints and eventually discovered that:

on \lib_layerdiffusion\models.py, line 236-237,

result = torch.stack(result, dim=0)
returned a normal tensor as it should, but the next line:

median = torch.median(result, dim=0).values
returned an empty tensor. Even assigning torch.median(result, dim=0) to a var and pulling the .values later didn't work.

So, it seems torch.median doesn't work on Directml. I managed to circle around the problem by:

result = torch.stack(result, dim=0).to("cpu")
and then casting it back right after:

return median.to(self.load_device)

This fixes the problem for Directml users and it didn't seem to effect performance. I'm not entirely sure if the problem is really for all Direcml users, let's see if anyone else complains too.

layerdiffusion / sd-forge-layerdiffuse Goto Github PK

sd-forge-layerdiffuse's Introduction

sd-forge-layerdiffuse

Updates

Before You Start

Model Notes

Sanity Check

SDXL

SD1.5

Generating Foregrounds and Backgrounds Together (SD1.5)

Background Condition (SD1.5, one step workflow)

Foreground Condition (SD1.5, one step workflow)

Some More Complicated Examples for SD1.5

Background Condition (SDXL, two steps workflow)

Foreground Condition (SDXL, two steps workflow)

sd-forge-layerdiffuse's People

Contributors

Stargazers

Watchers

Forkers

sd-forge-layerdiffuse's Issues

(1)

(2)

Recommend Projects

Recommend Topics

Recommend Org

Jobs