segmind / segmoe Goto Github PK

License: Apache License 2.0

Python 100.00%

segmoe's Introduction

SegMoE: Segmind Mixture of Diffusion Experts

SegMoE is a powerful framework for dynamically combining Stable Diffusion Models into a Mixture of Experts within minutes without training. The framework allows for creation of larger models on the fly which offer larger knowledge, better adherence and better image quality. It is inspired by mergekit's mixtral branch but for Stable Diffusion models.

Installation

pip install segmoe

Usage

Load Checkpoint from Hugging Face

We release 3 merges on Hugging Face,

SegMoE 2x1 has two expert models.
SegMoE 4x2 has four expert models.
SegMoE SD 4x2 has four Stable Diffusion 1.5 expert models.

They can be loaded as follows:

from segmoe import SegMoEPipeline

pipeline = SegMoEPipeline("segmind/SegMoE-4x2-v0", device = "cuda")

prompt = "cosmic canvas, orange city background, painting of a chubby cat"
negative_prompt = "nsfw, bad quality, worse quality"
img = pipeline(
    prompt=prompt,
    negative_prompt=negative_prompt,
    height=1024,
    width=1024,
    num_inference_steps=25,
    guidance_scale=7.5,
).images[0]
img.save("image.png")

Comparison

The Prompt Understanding seems to improve as shown in the images below. From Left to Right SegMoE-2x1-v0, SegMoE-4x2-v0, Base Model (RealVisXL_V3.0)

three green glass bottles

panda bear with aviator glasses on its head

the statue of Liberty next to the Washington Monument

Creating your Own Model

Create a yaml config file, config.yaml, with the following structure:

base_model: Base Model Path, Model Card or CivitAI Download Link
num_experts: Number of experts to use
moe_layers: Type of Layers to Mix (can be "ff", "attn" or "all"). Defaults to "attn"
num_experts_per_tok: Number of Experts to use 
type: Type of the individual models (can be "sd" or "sdxl"). Defaults to "sdxl"
experts:
  - source_model: Expert 1 Path, Model Card or CivitAI Download Link
    positive_prompt: Positive Prompt for computing gate weights
    negative_prompt: Negative Prompt for computing gate weights
  - source_model: Expert 2 Path, Model Card or CivitAI Download Link
    positive_prompt: Positive Prompt for computing gate weights
    negative_prompt: Negative Prompt for computing gate weights
  - source_model: Expert 3 Path, Model Card or CivitAI Download Link
    positive_prompt: Positive Prompt for computing gate weights
    negative_prompt: Negative Prompt for computing gate weights
  - source_model: Expert 4 Path, Model Card or CivitAI Download Link
    positive_prompt: Positive Prompt for computing gate weights
    negative_prompt: Negative Prompt for computing gate weights

Any number of models can be combined, An Example config can be found here. For detailed information on how to create a config file, please refer to the Config Parameters

Note Both Huggingface Models and CivitAI Models are supported. For CivitAI models, paste the download link of the model, For Example: "https://civitai.com/api/download/models/239306"

Then run the following command:

segmoe config.yaml segmoe_v0

This will create a folder called segmoe_v0 with the following structure:

├── model_index.json
├── scheduler
│   └── scheduler_config.json
├── text_encoder
│   ├── config.json
│   └── model.safetensors
├── text_encoder_2
│   ├── config.json
│   └── model.safetensors
├── tokenizer
│   ├── merges.txt
│   ├── special_tokens_map.json
│   ├── tokenizer_config.json
│   └── vocab.json
├── tokenizer_2
│   ├── merges.txt
│   ├── special_tokens_map.json
│   ├── tokenizer_config.json
│   └── vocab.json
├── unet
│   ├── config.json
│   └── diffusion_pytorch_model.safetensors
└──vae
    ├── config.json
    └── diffusion_pytorch_model.safetensors

Alternatively, you can also use the following command to create a mixture of experts model:

from segmoe import SegMoEPipeline

pipeline = SegMoEPipeline("config.yaml", device="cuda")

pipeline.save_pretrained("segmoe_v0")

Push to Hub

The Model can be pushed to the hub via the huggingface-cli

huggingface-cli upload segmind/segmoe_v0 ./segmoe_v0

Detailed usage can be found here

SDXL Turbo

To use SDXL Turbo style models, just change the scheduler to DPMSolverMultistepScheduler. Example config can be found here

Usage:

from segmoe import SegMoEPipeline

pipeline = SegMoEPipeline("segomoe_config_turbo.yaml", device = "cuda")
pipeline.pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.pipe.scheduler.config)

prompt = "cosmic canvas, orange city background, painting of a chubby cat"

image = pipe(prompt=prompt, num_inference_steps=6, guidance_scale=2).images[0]  

image.save("image.png")

Stable Diffusion 1.5 Support

Stable Diffusion 1.5 Models are also supported and work natively. Example config can be found here

Note: Stable Diffusion 1.5 Models can be combined with other SD1.5 Models only.

Other Tasks

Our Framework is tightly integrated with the Diffusers package which allows the use of AutoPipelineForImage2Image, AutoPipelineForInpainting and any other pipeline which supports the from_pipe method.

Image to Image

Here is example code for Image to Image generation:

from segmoe import SegMoEPipeline
from diffusers import AutoPipelineForImage2Image
t2i = SegMoEPipeline("segmind/SegMoE-SD-4x2-v0")

prompt = "cosmic canvas,  orange city background, painting of a chubby cat"
negative_prompt = "nsfw, bad quality, worse quality"
img = t2i(
    prompt=prompt,
    negative_prompt=negative_prompt,
    height=1024,
    width=1024,
    num_inference_steps=25,
    guidance_scale=7.5,
).images[0]
img.save("base_image.png")
pipeline = AutoPipelineForImage2Image.from_pipe(t2i.pipe)
prompt = "cosmic canvas,  orange city background, painting of a dog"
image = pipeline(prompt, img).images[0]
image.save("changed_image.png")

Inpainting

Here is example code for Inpainting:

from segmoe import SegMoEPipeline
from diffusers import AutoPipelineForInpainting

t2i = SegMoEPipeline("segmind/SegMoE-SD-4x2-v0")
pipeline = AutoPipelineForInpainting.from_pipe(t2i.pipe)
image = pipeline(prompt, image=init_image, mask_image=mask_image).images[0]
image.save("inpainted_image.png")

Memory Requirements

SDXL 2xN : 19GB
SDXL 4xN : 25GB
SD1.5 4xN : 7GB

Advantages

Benefits from The Knowledge of Several Finetuned Experts
Training Free
Better Adaptability to Data
Model Can be upgraded by using a better finetuned model as one of the experts.

Limitations

Though the Model improves upon the fidelity of images as well as adherence, it does not be drastically better than any one expert without training and relies on the knowledge of the experts.
This is not yet optimized for speed.
The framework is not yet optimized for memory usage.

Research Roadmap

Config Parameters

Base Model

The base model is the model that will be used to generate the initial image. It can be a Huggingface model card, a CivitAI model download link or a local path to a safetensors file.

Number of Experts

The number of experts to use in the mixture of experts model. The number of experts must be greater than 1. The Number of experts can be anything greater than 2 as long as the GPU fits it.

MOE Layers

The type of layers to mix. Can be "ff", "attn" or "all". Defaults to "attn". "ff" merges only the feedforward layers, "attn" merges only the attention layers and "all" merges all layers.

Type

The type of the models to mix. Can be "sd" or "sdxl". Defaults to "sdxl".

Experts

The Experts are the models that will be used to generate the final image. Each expert must have a source model, a positive prompt and a negative prompt. The source model can be a Huggingface model card, a CivitAI model download link or a local path to a safetensors file. The positive prompt and negative prompt are the prompts that will be used to compute the gate weights for each expert and impact the quality of the final model, choose these carefully.

Citation

@misc{segmoe,
  author = {Yatharth Gupta, Vishnu V Jaddipal, Harish Prabhala},
  title = {SegMoE: Segmind Mixture of Diffusion Experts},
  year = {2024},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/segmind/segmoe}}
}

segmoe's People

Stargazers

Watchers

segmoe's Issues

The same seed and prompt produces a completely black image

I use the following code to test

from segmoe import SegMoEPipeline

pipeline = SegMoEPipeline("segmind/SegMoE-sd-4x2-v0", device = "cuda")

prompt = "a beautiful Asian girl, front view, full-length portrait, One hand is half-raised, as if waving hello."
negative_prompt = "ugly, deformed, badhandv4, ng_deepnegative_v1_75t, nsfw, bad quality, worse quality"
img = pipeline(
    prompt=prompt,
    negative_prompt=negative_prompt,
    height=768,
    width=512,
    num_inference_steps=30,
    guidance_scale=7.5,
    seed=1575769330,
    Sampler='DPM++ SDE Karras',
    low_cpu_mem_usage=True,
).images[0]
img.save("image1.png")

An error will occur if you repeat the execution 2-3 times.
The error message is as follows:

Potential NSFW content was detected in one or more images. A black image will be returned instead. Try again with a different prompt and/or seed.

positive and negative keywords in .yaml files

Hello,
may I ask you - is there a benefit of using different positive and negative keywords in yaml for each model or may the keywords be the same?

Multi-GPU support

Since multiple experts can take a lot of VRAM, especially for SDXL, it would useful to have a way to choose which experts to load to which GPU (since GPU can have different VRAM each).

Licence?

Apache 2.0 I'm guessing, like with the huggingface repo? Just looking to make sure :) Thanks!

Could you explain the effect of Pos and Neg prompts of each experts?

i.e. How do the prompts relate to the gates?

For example:

If I want an expert to be more important, should I tune the prompt accordingly, how?
If I know some expert is good/bad at some subject, should I tune the prompt accordingly, how?

Why using negative prompt hidden states as gate weight?

hi,

ref: https://github.com/segmind/segmoe/blob/5fce95320f932aeb0991c9c0c31a3be72dbf7ce8/segmoe/main.py#L1300C13-L1300C26

 @torch.no_grad
  def get_hidden_states(self, model, positive, negative, average: bool = True):
      intermediate = {}
      self.cast_hook(model, intermediate)
      with torch.no_grad():
          _ = model(positive, negative_prompt=negative, num_inference_steps=25)
      hidden = {}
      for key in intermediate:
          hidden_states = intermediate[key][0][-1]  #### why using negative prompt as hidden states
          if average:
              # use average over sequence
              hidden_states = hidden_states.sum(dim=0) / hidden_states.shape[0]
          else:
              # take last value
              hidden_states = hidden_states[:-1]
          hidden[key] = hidden_states.to(self.device)
      del intermediate
      gc.collect()
      torch.cuda.empty_cache()
      return hidden

Got noise image sample

Hi,

Thanks for this interesting work!!

However, I got a pure noise image when running the sample code below.

Hardware:

NVIDIA A5000

code:

import os
import torch
import torch.nn as nn

from segmoe import SegMoEPipeline


pipeline = SegMoEPipeline("segmind/SegMoE-2x1-v0", device = "cuda")

prompt = "cosmic canvas, orange city background, painting of a chubby cat"
negative_prompt = "nsfw, bad quality, worse quality"
img = pipeline(
    prompt=prompt,
    negative_prompt=negative_prompt,
    height=1024,
    width=1024,
    num_inference_steps=25,
    guidance_scale=7.5,
).images[0]
img.save("image.png")

environment:

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main
_openmp_mutex             5.1                       1_gnu
accelerate                0.29.3                   pypi_0    pypi
bzip2                     1.0.8                h5eee18b_5
ca-certificates           2024.3.11            h06a4308_0
certifi                   2024.2.2                 pypi_0    pypi
charset-normalizer        3.3.2                    pypi_0    pypi
diffusers                 0.27.2                   pypi_0    pypi
filelock                  3.13.4                   pypi_0    pypi
fsspec                    2024.3.1                 pypi_0    pypi
huggingface-hub           0.22.2                   pypi_0    pypi
idna                      3.7                      pypi_0    pypi
importlib-metadata        7.1.0                    pypi_0    pypi
jinja2                    3.1.3                    pypi_0    pypi
ld_impl_linux-64          2.38                 h1181459_1
libffi                    3.3                  he6710b0_2
libgcc-ng                 11.2.0               h1234567_1
libgomp                   11.2.0               h1234567_1
libstdcxx-ng              11.2.0               h1234567_1
libuuid                   1.41.5               h5eee18b_0
markupsafe                2.1.5                    pypi_0    pypi
mpmath                    1.3.0                    pypi_0    pypi
ncurses                   6.4                  h6a678d5_0
networkx                  3.3                      pypi_0    pypi
numpy                     1.26.4                   pypi_0    pypi
nvidia-cublas-cu12        12.1.3.1                 pypi_0    pypi
nvidia-cuda-cupti-cu12    12.1.105                 pypi_0    pypi
nvidia-cuda-nvrtc-cu12    12.1.105                 pypi_0    pypi
nvidia-cuda-runtime-cu12  12.1.105                 pypi_0    pypi
nvidia-cudnn-cu12         8.9.2.26                 pypi_0    pypi
nvidia-cufft-cu12         11.0.2.54                pypi_0    pypi
nvidia-curand-cu12        10.3.2.106               pypi_0    pypi
nvidia-cusolver-cu12      11.4.5.107               pypi_0    pypi
nvidia-cusparse-cu12      12.1.0.106               pypi_0    pypi
nvidia-nccl-cu12          2.19.3                   pypi_0    pypi
nvidia-nvjitlink-cu12     12.4.127                 pypi_0    pypi
nvidia-nvtx-cu12          12.1.105                 pypi_0    pypi
openssl                   1.1.1w               h7f8727e_0
packaging                 24.0                     pypi_0    pypi
pillow                    10.3.0                   pypi_0    pypi
pip                       23.3.1          py310h06a4308_0
psutil                    5.9.8                    pypi_0    pypi
python                    3.10.6               haa1d7c7_1
pyyaml                    6.0.1                    pypi_0    pypi
readline                  8.2                  h5eee18b_0
regex                     2024.4.16                pypi_0    pypi
requests                  2.31.0                   pypi_0    pypi
safetensors               0.4.3                    pypi_0    pypi
segmoe                    0.0.4                    pypi_0    pypi
setuptools                68.2.2          py310h06a4308_0
sqlite                    3.41.2               h5eee18b_0
sympy                     1.12                     pypi_0    pypi
tk                        8.6.12               h1ccaba5_0
tokenizers                0.19.1                   pypi_0    pypi
torch                     2.2.2                    pypi_0    pypi
tqdm                      4.66.2                   pypi_0    pypi
transformers              4.40.0                   pypi_0    pypi
triton                    2.2.0                    pypi_0    pypi
typing-extensions         4.11.0                   pypi_0    pypi
tzdata                    2024a                h04d1e81_0
urllib3                   2.2.1                    pypi_0    pypi
wheel                     0.41.2          py310h06a4308_0
xz                        5.4.6                h5eee18b_0
zipp                      3.18.1                   pypi_0    pypi
zlib                      1.2.13               h5eee18b_0

Image I got:

Is torch 2.0 mandatory?

Those old server with old Ubuntu can only use old torch. So can I make a few hack to run it under old torch?

MoE in the attn heads

Awesome project! Thank you publishing it! I was just curious about the following:

Why does the SegMoE SD 4x2 model have Mixture of Experts (MoE) layers within their attention heads, while most other models, including the tutorial on Huggingface (https://huggingface.co/blog/moe), typically use MoE layers in the feedforward network (FFN)? What's the distinction between these approaches?

Support local safetensors file

I may add PR but I have no time to code.
As an A1111 / ComfyUI user, I find that it doesn't support local safetensors file out of the box.
Instead of pip install segmoe (I encountered dependency hell even I am using conda already), I suggest uninstall segmoe and then directly using the cloned files (put to ./segmoe which is same directory of a sample usage code python train.py) and start modifying the script.
But I attempted to load model with diffusers, and I got no idea how to do it.
Hope that there will be solution and let me not needed to rely on CivitAI models (or using a dummy http host to host the models)

For HTTP host, npm install -g http-server will save your day.

edit: wget is not present in Windows, install it first.
edit2: Got it working. Required Code changes. Mainly StableDiffusionXLPipeline.from_single_file and modify URL to in main.py.

API_MODEL_URL_CIVITAI = "https://civitai.com/api/download/models/"
API_MODEL_URL = "http://localhost:8080/models/"

Config file x17-AstolfoMoE_a3p6.yaml:

base_model: http://localhost:8080/models/x17-AstolfoMix-x13te0x14te1.safetensors
num_experts: 2
moe_layers: all
num_experts_per_tok: 1
experts:
  - source_model: http://localhost:8080/models/_x14-ponyDiffusionV6XL_v6.safetensors
    positive_prompt: "xxx"
    negative_prompt: "xxx"
  - source_model: http://localhost:8080/models/_x08-animagineXLV3_v30.safetensors
    positive_prompt: "xxx"
    negative_prompt: "xxx"

And finally python not_train.py:

from segmoe import SegMoEPipeline
import torch

# OOM with RTX 3090, need 48GB+ VRAM! 

pipeline = SegMoEPipeline("x17-AstolfoMoE_a3p6.yaml", device="cpu", torch_dtype=torch.float, variant="fp32")

pipeline.save_pretrained("segmoe_v0")

Finally spent 26 minutes and around 80GB of RAM to "train" on a i9-7960X CPU.
"eval" (Generate image) can use cuda, which is as fast as usual.

TypeError: no_grad.init() on import

Trying to import from segmoe import SegMoEPipeline, throws:

File "C:\Users\xyz\AppData\Roaming\Python\Python310\site-packages\segmoe\main.py", line 89, in <module>
  class SegMoEPipeline:
File "C:\Users\xyz\AppData\Roaming\Python\Python310\site-packages\segmoe\main.py", line 1260, in SegMoEPipeline
  def get_hidden_states(self, model, positive, negative, average: bool = True):
TypeError: no_grad.__init__() takes 1 positional argument but 2 were given

Issue with Civitai downloads

They'll never load, code needs to be fixed to use a different diffusion pipeline that does have from_single_file available

Traceback (most recent call last):
File "C:\Users\DonaldNixon\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\DonaldNixon\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "C:\AI\Segmoe\Scripts\segmoe.exe_main.py", line 7, in
File "C:\AI\Segmoe\lib\site-packages\segmoe\cli.py", line 6, in create
pipe = SegMoEPipeline(args[1])
File "C:\AI\Segmoe\lib\site-packages\segmoe\main.py", line 114, in init
self.load_from_scratch(config_or_path, **kwargs)
File "C:\AI\Segmoe\lib\site-packages\segmoe\main.py", line 171, in load_from_scratch
self.pipe = DiffusionPipeline.from_single_file(
AttributeError: type object 'DiffusionPipeline' has no attribute 'from_single_file'

Any benefit to implementing this with lycoris/lora instead of full models?

Instead of loading multiple full-sized Stable Diffusion models, I wonder the potential benefits of caching high-density locon or lycoris representations. This could be a more efficient way to operate on consumer hardware. I'm not certain if this approach would actually offer any improvements, though, or if this pipeline is something that would support or benefit from it.

Minor mistake in readme

The code example in the readme for using SegMoE with SDXL-Turbo appears to be slightly wrong. It imports the pipeline correctly, but then uses SegMoETurboPipeline out of nowhere. I tried to import that SegMoETurboPipeline and it wasn't found. Using the regular pipeline (SegMoEPipeline) in the import and the function call worked for me, hence my conclusion is that the docs must be wrong.

TypeError: SparseMoeBlock.forward() missing 1 required positional argument: 'scale'

What version of diffusers/transformers do I need?

TypeError                                 Traceback (most recent call last)
Cell In[1], line 7
      5 prompt = "cosmic canvas, orange city background, painting of a chubby cat"
      6 negative_prompt = "nsfw, bad quality, worse quality"
----> 7 img = pipeline(
      8     prompt=prompt,
      9     negative_prompt=negative_prompt,
     10     height=1024,
     11     width=1024,
     12     num_inference_steps=25,
     13     guidance_scale=7.5,
     14     scale=7.5,
     15 ).images[0]
     16 img.save("image.png")

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\segmoe\main.py:866, in SegMoEPipeline.__call__(self, *args, **kwds)
    860 """
    861 Inference the SegMoEPipeline.
    862 
    863 Calls diffusers.DiffusionPipeline forward with the keyword arguments. See https://github.com/segmind/segmoe#usage for detailed usage.
    864 """
    865 kwds["scale"]=7.5
--> 866 return self.pipe(*args, **kwds)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\_contextlib.py:115, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    112 @functools.wraps(func)
    113 def decorate_context(*args, **kwargs):
    114     with ctx_factory():
--> 115         return func(*args, **kwargs)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\pipelines\stable_diffusion\pipeline_stable_diffusion.py:1027, in StableDiffusionPipeline.__call__(self, prompt, height, width, num_inference_steps, timesteps, guidance_scale, negative_prompt, num_images_per_prompt, eta, generator, latents, prompt_embeds, negative_prompt_embeds, ip_adapter_image, output_type, return_dict, cross_attention_kwargs, guidance_rescale, clip_skip, callback_on_step_end, callback_on_step_end_tensor_inputs, **kwargs)
   1024 latent_model_input = self.scheduler.scale_model_input(latent_model_input, t)
   1026 # predict the noise residual
-> 1027 noise_pred = self.unet(
   1028     latent_model_input,
   1029     t,
   1030     encoder_hidden_states=prompt_embeds,
   1031     timestep_cond=timestep_cond,
   1032     cross_attention_kwargs=self.cross_attention_kwargs,
   1033     added_cond_kwargs=added_cond_kwargs,
   1034     return_dict=False,
   1035 )[0]
   1037 # perform guidance
   1038 if self.do_classifier_free_guidance:

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs)
   1516     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1517 else:
-> 1518     return self._call_impl(*args, **kwargs)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py:1527, in Module._call_impl(self, *args, **kwargs)
   1522 # If we don't have any hooks, we want to skip the rest of the logic in
   1523 # this function, and just call forward.
   1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1525         or _global_backward_pre_hooks or _global_backward_hooks
   1526         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1527     return forward_call(*args, **kwargs)
   1529 try:
   1530     result = None

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\models\unets\unet_2d_condition.py:1121, in UNet2DConditionModel.forward(self, sample, timestep, encoder_hidden_states, class_labels, timestep_cond, attention_mask, cross_attention_kwargs, added_cond_kwargs, down_block_additional_residuals, mid_block_additional_residual, down_intrablock_additional_residuals, encoder_attention_mask, return_dict)
   1118     if is_adapter and len(down_intrablock_additional_residuals) > 0:
   1119         additional_residuals["additional_residuals"] = down_intrablock_additional_residuals.pop(0)
-> 1121     sample, res_samples = downsample_block(
   1122         hidden_states=sample,
   1123         temb=emb,
   1124         encoder_hidden_states=encoder_hidden_states,
   1125         attention_mask=attention_mask,
   1126         cross_attention_kwargs=cross_attention_kwargs,
   1127         encoder_attention_mask=encoder_attention_mask,
   1128         **additional_residuals,
   1129     )
   1130 else:
   1131     sample, res_samples = downsample_block(hidden_states=sample, temb=emb, scale=lora_scale)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs)
   1516     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1517 else:
-> 1518     return self._call_impl(*args, **kwargs)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py:1527, in Module._call_impl(self, *args, **kwargs)
   1522 # If we don't have any hooks, we want to skip the rest of the logic in
   1523 # this function, and just call forward.
   1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1525         or _global_backward_pre_hooks or _global_backward_hooks
   1526         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1527     return forward_call(*args, **kwargs)
   1529 try:
   1530     result = None

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\models\unets\unet_2d_blocks.py:1199, in CrossAttnDownBlock2D.forward(self, hidden_states, temb, encoder_hidden_states, attention_mask, cross_attention_kwargs, encoder_attention_mask, additional_residuals)
   1197 else:
   1198     hidden_states = resnet(hidden_states, temb, scale=lora_scale)
-> 1199     hidden_states = attn(
   1200         hidden_states,
   1201         encoder_hidden_states=encoder_hidden_states,
   1202         cross_attention_kwargs=cross_attention_kwargs,
   1203         attention_mask=attention_mask,
   1204         encoder_attention_mask=encoder_attention_mask,
   1205         return_dict=False,
   1206     )[0]
   1208 # apply additional residuals to the output of the last pair of resnet and attention blocks
   1209 if i == len(blocks) - 1 and additional_residuals is not None:

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs)
   1516     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1517 else:
-> 1518     return self._call_impl(*args, **kwargs)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py:1527, in Module._call_impl(self, *args, **kwargs)
   1522 # If we don't have any hooks, we want to skip the rest of the logic in
   1523 # this function, and just call forward.
   1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1525         or _global_backward_pre_hooks or _global_backward_hooks
   1526         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1527     return forward_call(*args, **kwargs)
   1529 try:
   1530     result = None

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\models\transformers\transformer_2d.py:391, in Transformer2DModel.forward(self, hidden_states, encoder_hidden_states, timestep, added_cond_kwargs, class_labels, cross_attention_kwargs, attention_mask, encoder_attention_mask, return_dict)
    379         hidden_states = torch.utils.checkpoint.checkpoint(
    380             create_custom_forward(block),
    381             hidden_states,
   (...)
    388             **ckpt_kwargs,
    389         )
    390     else:
--> 391         hidden_states = block(
    392             hidden_states,
    393             attention_mask=attention_mask,
    394             encoder_hidden_states=encoder_hidden_states,
    395             encoder_attention_mask=encoder_attention_mask,
    396             timestep=timestep,
    397             cross_attention_kwargs=cross_attention_kwargs,
    398             class_labels=class_labels,
    399         )
    401 # 3. Output
    402 if self.is_input_continuous:

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs)
   1516     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1517 else:
-> 1518     return self._call_impl(*args, **kwargs)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py:1527, in Module._call_impl(self, *args, **kwargs)
   1522 # If we don't have any hooks, we want to skip the rest of the logic in
   1523 # this function, and just call forward.
   1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1525         or _global_backward_pre_hooks or _global_backward_hooks
   1526         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1527     return forward_call(*args, **kwargs)
   1529 try:
   1530     result = None

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\models\attention.py:329, in BasicTransformerBlock.forward(self, hidden_states, attention_mask, encoder_hidden_states, encoder_attention_mask, timestep, cross_attention_kwargs, class_labels, added_cond_kwargs)
    326 cross_attention_kwargs = cross_attention_kwargs.copy() if cross_attention_kwargs is not None else {}
    327 gligen_kwargs = cross_attention_kwargs.pop("gligen", None)
--> 329 attn_output = self.attn1(
    330     norm_hidden_states,
    331     encoder_hidden_states=encoder_hidden_states if self.only_cross_attention else None,
    332     attention_mask=attention_mask,
    333     **cross_attention_kwargs,
    334 )
    335 if self.norm_type == "ada_norm_zero":
    336     attn_output = gate_msa.unsqueeze(1) * attn_output

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs)
   1516     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1517 else:
-> 1518     return self._call_impl(*args, **kwargs)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py:1527, in Module._call_impl(self, *args, **kwargs)
   1522 # If we don't have any hooks, we want to skip the rest of the logic in
   1523 # this function, and just call forward.
   1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1525         or _global_backward_pre_hooks or _global_backward_hooks
   1526         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1527     return forward_call(*args, **kwargs)
   1529 try:
   1530     result = None

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\models\attention_processor.py:512, in Attention.forward(self, hidden_states, encoder_hidden_states, attention_mask, **cross_attention_kwargs)
    493 r"""
    494 The forward method of the `Attention` class.
    495 
   (...)
    507     `torch.Tensor`: The output of the attention layer.
    508 """
    509 # The `Attention` class can call different attention processors / attention functions
    510 # here we simply pass along all tensors to the selected processor class
    511 # For standard processors that are defined here, `**cross_attention_kwargs` is empty
--> 512 return self.processor(
    513     self,
    514     hidden_states,
    515     encoder_hidden_states=encoder_hidden_states,
    516     attention_mask=attention_mask,
    517     **cross_attention_kwargs,
    518 )

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\models\attention_processor.py:1224, in AttnProcessor2_0.__call__(self, attn, hidden_states, encoder_hidden_states, attention_mask, temb, scale)
   1221     hidden_states = attn.group_norm(hidden_states.transpose(1, 2)).transpose(1, 2)
   1223 args = () if USE_PEFT_BACKEND else (scale,)
-> 1224 query = attn.to_q(hidden_states, *args)
   1226 if encoder_hidden_states is None:
   1227     encoder_hidden_states = hidden_states

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs)
   1516     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1517 else:
-> 1518     return self._call_impl(*args, **kwargs)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py:1527, in Module._call_impl(self, *args, **kwargs)
   1522 # If we don't have any hooks, we want to skip the rest of the logic in
   1523 # this function, and just call forward.
   1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1525         or _global_backward_pre_hooks or _global_backward_hooks
   1526         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1527     return forward_call(*args, **kwargs)
   1529 try:
   1530     result = None

TypeError: SparseMoeBlock.forward() missing 1 required positional argument: 'scale'

[feature] Support StableDiffusionImg2ImgPipeline

It would be nice to be able to use img2img with this.

Does this work for Stable Cascade?

Thank you! + model suggestion

Really nice results, thank you! If you could manage to get 4 models + LORAs into 24GB VRAM, that would be great.

Just a small suggestion to include DPO finetuned models as https://civitai.com/models/239624/opendallev11 and https://civitai.com/models/239836?modelVersionId=270839 they are much better at prompt following.

Support Colab and Local Storage

When I'm using a V100 GPU Google Colab ENV, here are two main problems:

ENV:

There is no way to cache for model files, so every time has to download very large model files, so as local env users.
similar problem: #10
Finally, the official script run failed, with the information:
The config attributes {'segmoe_config': {'base_model': 'SG161222/RealVisXL_V3.0', 'down_idx_start': 1, 'down_idx_end': 3, 'experts': [{'negative_prompt': '(worst quality, low quality, normal quality, lowres, low details, oversaturated, undersaturated, overexposed, underexposed, grayscale, bw, bad photo, bad photography, bad art:1.4), (watermark, signature, text font, username, error, logo, words, letters, digits, autograph, trademark, name:1.2), (blur, blurry, grainy), morbid, ugly, asymmetrical, mutated malformed, mutilated, poorly lit, bad shadow, draft, cropped, out of frame, cut off, censored, jpeg artifacts, out of focus, glitch, duplicate, (airbrushed, cartoon, anime, semi-realistic, cgi, render, blender, digital art, manga, amateur:1.3), (3D ,3D Game, 3D Game Scene, 3D Character:1.1), (bad hands, bad anatomy, bad body, bad face, bad teeth, bad arms, bad legs, deformities:1.3)', 'positive_prompt': 'aesthetic, cinematic, hands, portrait, photo, illustration, 8K, hyperdetailed, origami, man, woman, supercar', 'source_model': 'frankjoshua/juggernautXL_v8Rundiffusion'}, {'negative_prompt': '(octane render, render, drawing, anime, bad photo, bad photography:1.3), (worst quality, low quality, blurry:1.2), (bad teeth, deformed teeth, deformed lips), (bad anatomy, bad proportions:1.1), (deformed iris, deformed pupils), (deformed eyes, bad eyes), (deformed face, ugly face, bad face), (deformed hands, bad hands, fused fingers), morbid, mutilated, mutation, disfigured', 'positive_prompt': 'cinematic, portrait, photograph, instagram, fashion, movie, macro shot, 8K, RAW, hyperrealistic, ultra realistic,', 'source_model': 'SG161222/RealVisXL_V3.0'}, {'negative_prompt': 'Compression artifacts, bad art, worst quality, low quality, plastic, fake, bad limbs, conjoined, featureless, bad features, incorrect objects, watermark, ((signature):1.25), logo', 'positive_prompt': 'minimalist, illustration, award winning art, painting, impressionist, comic, colors, sketch, pencil drawing,', 'source_model': 'albertushka/albertushka_DynaVisionXL'}, {'negative_prompt': 'nsfw, lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, blurry', 'positive_prompt': 'photograph f/1.4, ISO 200, 1/160s, 8K, RAW, unedited, symmetrical balance, in-frame, 8K', 'source_model': 'frankjoshua/albedobaseXL_v13'}], 'moe_layers': 'all', 'num_experts': 4, 'num_experts_per_tok': 2, 'up_idx_end': 2, 'up_idx_start': 0}} were passed to UNet2DConditionModel, but are not expected and will be ignored. Please verify your config.json configuration file.

77 token limit

Is it potentially possible to use compel? I tried to add it but it complains about the tokenizer. Not very good at coding, should it work or shouldn't I try?

In general, today, imho, segmoe gives the best results. I'm delighted)

No hiresfix, no upscale.