GithubHelp home page GithubHelp logo

if's Introduction

License License Downloads Discord Twitter Linktree

We introduce DeepFloyd IF, a novel state-of-the-art open-source text-to-image model with a high degree of photorealism and language understanding. DeepFloyd IF is a modular composed of a frozen text encoder and three cascaded pixel diffusion modules: a base model that generates 64x64 px image based on text prompt and two super-resolution models, each designed to generate images of increasing resolution: 256x256 px and 1024x1024 px. All stages of the model utilize a frozen text encoder based on the T5 transformer to extract text embeddings, which are then fed into a UNet architecture enhanced with cross-attention and attention pooling. The result is a highly efficient model that outperforms current state-of-the-art models, achieving a zero-shot FID score of 6.66 on the COCO dataset. Our work underscores the potential of larger UNet architectures in the first stage of cascaded diffusion models and depicts a promising future for text-to-image synthesis.

Inspired by Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

Minimum requirements to use all IF models:

  • 16GB vRAM for IF-I-XL (4.3B text to 64x64 base module) & IF-II-L (1.2B to 256x256 upscaler module)
  • 24GB vRAM for IF-I-XL (4.3B text to 64x64 base module) & IF-II-L (1.2B to 256x256 upscaler module) & Stable x4 (to 1024x1024 upscaler)
  • xformers and set env variable FORCE_MEM_EFFICIENT_ATTN=1

Quick Start

Open In Colab Hugging Face Spaces

pip install deepfloyd_if==1.0.2rc0
pip install xformers==0.0.16
pip install git+https://github.com/openai/CLIP.git --no-deps

Local notebooks

Jupyter Notebook Kaggle

The Dream, Style Transfer, Super Resolution or Inpainting modes are avaliable in a Jupyter Notebook here.

Integration with ๐Ÿค— Diffusers

IF is also integrated with the ๐Ÿค— Hugging Face Diffusers library.

Diffusers runs each stage individually allowing the user to customize the image generation process as well as allowing to inspect intermediate results easily.

Example

Before you can use IF, you need to accept its usage conditions. To do so:

  1. Make sure to have a Hugging Face account and be loggin in
  2. Accept the license on the model card of DeepFloyd/IF-I-XL-v1.0
  3. Make sure to login locally. Install huggingface_hub
pip install huggingface_hub --upgrade

run the login function in a Python shell

from huggingface_hub import login

login()

and enter your Hugging Face Hub access token.

Next we install diffusers and dependencies:

pip install diffusers accelerate transformers safetensors

And we can now run the model locally.

By default diffusers makes use of model cpu offloading to run the whole IF pipeline with as little as 14 GB of VRAM.

If you are using torch>=2.0.0, make sure to delete all enable_xformers_memory_efficient_attention() functions.

from diffusers import DiffusionPipeline
from diffusers.utils import pt_to_pil
import torch

# stage 1
stage_1 = DiffusionPipeline.from_pretrained("DeepFloyd/IF-I-XL-v1.0", variant="fp16", torch_dtype=torch.float16)
stage_1.enable_xformers_memory_efficient_attention()  # remove line if torch.__version__ >= 2.0.0
stage_1.enable_model_cpu_offload()

# stage 2
stage_2 = DiffusionPipeline.from_pretrained(
    "DeepFloyd/IF-II-L-v1.0", text_encoder=None, variant="fp16", torch_dtype=torch.float16
)
stage_2.enable_xformers_memory_efficient_attention()  # remove line if torch.__version__ >= 2.0.0
stage_2.enable_model_cpu_offload()

# stage 3
safety_modules = {"feature_extractor": stage_1.feature_extractor, "safety_checker": stage_1.safety_checker, "watermarker": stage_1.watermarker}
stage_3 = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-x4-upscaler", **safety_modules, torch_dtype=torch.float16)
stage_3.enable_xformers_memory_efficient_attention()  # remove line if torch.__version__ >= 2.0.0
stage_3.enable_model_cpu_offload()

prompt = 'a photo of a kangaroo wearing an orange hoodie and blue sunglasses standing in front of the eiffel tower holding a sign that says "very deep learning"'

# text embeds
prompt_embeds, negative_embeds = stage_1.encode_prompt(prompt)

generator = torch.manual_seed(0)

# stage 1
image = stage_1(prompt_embeds=prompt_embeds, negative_prompt_embeds=negative_embeds, generator=generator, output_type="pt").images
pt_to_pil(image)[0].save("./if_stage_I.png")

# stage 2
image = stage_2(
    image=image, prompt_embeds=prompt_embeds, negative_prompt_embeds=negative_embeds, generator=generator, output_type="pt"
).images
pt_to_pil(image)[0].save("./if_stage_II.png")

# stage 3
image = stage_3(prompt=prompt, image=image, generator=generator, noise_level=100).images
image[0].save("./if_stage_III.png")

There are multiple ways to speed up the inference time and lower the memory consumption even more with diffusers. To do so, please have a look at the Diffusers docs:

For more in-detail information about how to use IF, please have a look at the IF blog post and the documentation ๐Ÿ“–.

Diffusers dreambooth scripts also supports fine-tuning ๐ŸŽจ IF. With parameter efficient finetuning, you can add new concepts to IF with a single GPU and ~28 GB VRAM.

Run the code locally

Loading the models into VRAM

from deepfloyd_if.modules import IFStageI, IFStageII, StableStageIII
from deepfloyd_if.modules.t5 import T5Embedder

device = 'cuda:0'
if_I = IFStageI('IF-I-XL-v1.0', device=device)
if_II = IFStageII('IF-II-L-v1.0', device=device)
if_III = StableStageIII('stable-diffusion-x4-upscaler', device=device)
t5 = T5Embedder(device="cpu")

I. Dream

Dream is the text-to-image mode of the IF model

from deepfloyd_if.pipelines import dream

prompt = 'ultra close-up color photo portrait of rainbow owl with deer horns in the woods'
count = 4

result = dream(
    t5=t5, if_I=if_I, if_II=if_II, if_III=if_III,
    prompt=[prompt]*count,
    seed=42,
    if_I_kwargs={
        "guidance_scale": 7.0,
        "sample_timestep_respacing": "smart100",
    },
    if_II_kwargs={
        "guidance_scale": 4.0,
        "sample_timestep_respacing": "smart50",
    },
    if_III_kwargs={
        "guidance_scale": 9.0,
        "noise_level": 20,
        "sample_timestep_respacing": "75",
    },
)

if_III.show(result['III'], size=14)

II. Zero-shot Image-to-Image Translation

In Style Transfer mode, the output of your prompt comes out at the style of the support_pil_img

from deepfloyd_if.pipelines import style_transfer

result = style_transfer(
    t5=t5, if_I=if_I, if_II=if_II,
    support_pil_img=raw_pil_image,
    style_prompt=[
        'in style of professional origami',
        'in style of oil art, Tate modern',
        'in style of plastic building bricks',
        'in style of classic anime from 1990',
    ],
    seed=42,
    if_I_kwargs={
        "guidance_scale": 10.0,
        "sample_timestep_respacing": "10,10,10,10,10,10,10,10,0,0",
        'support_noise_less_qsample_steps': 5,
    },
    if_II_kwargs={
        "guidance_scale": 4.0,
        "sample_timestep_respacing": 'smart50',
        "support_noise_less_qsample_steps": 5,
    },
)
if_I.show(result['II'], 1, 20)

Alternative Text

III. Super Resolution

For super-resolution, users can run IF-II and IF-III or 'Stable x4' on an image that was not necessarely generated by IF (two cascades):

from deepfloyd_if.pipelines import super_resolution

middle_res = super_resolution(
    t5,
    if_III=if_II,
    prompt=['woman with a blue headscarf and a blue sweaterp, detailed picture, 4k dslr, best quality'],
    support_pil_img=raw_pil_image,
    img_scale=4.,
    img_size=64,
    if_III_kwargs={
        'sample_timestep_respacing': 'smart100',
        'aug_level': 0.5,
        'guidance_scale': 6.0,
    },
)
high_res = super_resolution(
    t5,
    if_III=if_III,
    prompt=[''],
    support_pil_img=middle_res['III'][0],
    img_scale=4.,
    img_size=256,
    if_III_kwargs={
        "guidance_scale": 9.0,
        "noise_level": 20,
        "sample_timestep_respacing": "75",
    },
)
show_superres(raw_pil_image, high_res['III'][0])

IV. Zero-shot Inpainting

from deepfloyd_if.pipelines import inpainting

result = inpainting(
    t5=t5, if_I=if_I,
    if_II=if_II,
    if_III=if_III,
    support_pil_img=raw_pil_image,
    inpainting_mask=inpainting_mask,
    prompt=[
        'oil art, a man in a hat',
    ],
    seed=42,
    if_I_kwargs={
        "guidance_scale": 7.0,
        "sample_timestep_respacing": "10,10,10,10,10,0,0,0,0,0",
        'support_noise_less_qsample_steps': 0,
    },
    if_II_kwargs={
        "guidance_scale": 4.0,
        'aug_level': 0.0,
        "sample_timestep_respacing": '100',
    },
    if_III_kwargs={
        "guidance_scale": 9.0,
        "noise_level": 20,
        "sample_timestep_respacing": "75",
    },
)
if_I.show(result['I'], 2, 3)
if_I.show(result['II'], 2, 6)
if_I.show(result['III'], 2, 14)

๐Ÿค— Model Zoo ๐Ÿค—

The link to download the weights as well as the model cards will be available soon on each model of the model zoo

Original

Name Cascade Params FID Batch size Steps
IF-I-M I 400M 8.86 3072 2.5M
IF-I-L I 900M 8.06 3200 3.0M
IF-I-XL* I 4.3B 6.66 3072 2.42M
IF-II-M II 450M - 1536 2.5M
IF-II-L* II 1.2B - 1536 2.5M
IF-III-L* (soon) III 700M - 3072 1.25M

*best modules

Quantitative Evaluation

FID = 6.66

License

The code in this repository is released under the bespoke license (see added point two).

The weights will be available soon via the DeepFloyd organization at Hugging Face and have their own LICENSE.

Disclaimer: The initial release of the IF model is under a restricted research-purposes-only license temporarily to gather feedback, and after that we intend to release a fully open-source model in line with other Stability AI models.

Limitations and Biases

The models available in this codebase have known limitations and biases. Please refer to the model card for more information.

๐ŸŽ“ DeepFloyd IF creators:

๐Ÿ“„ Research Paper (Soon)

Acknowledgements

Special thanks to StabilityAI and its CEO Emad Mostaque for invaluable support, providing GPU compute and infrastructure to train the models (our gratitude goes to Richard Vencu); thanks to LAION and Christoph Schuhmann in particular for contribution to the project and well-prepared datasets; thanks to Huggingface teams for optimizing models' speed and memory consumption during inference, creating demos and giving cool advice!

๐Ÿš€ External Contributors ๐Ÿš€

  • The Biggest Thanks @Apolinรกrio, for ideas, consultations, help and support on all stages to make IF available in open-source; for writing a lot of documentation and instructions; for creating a friendly atmosphere in difficult moments ๐Ÿฆ‰;
  • Thanks, @patrickvonplaten, for improving loading time of unet models by 80%; for integration Stable-Diffusion-x4 as native pipeline ๐Ÿ’ช;
  • Thanks, @williamberman and @patrickvonplaten for diffusers integration ๐Ÿ™Œ;
  • Thanks, @hysts and @Apolinรกrio for creating the best gradio demo with IF ๐Ÿš€;
  • Thanks, @Dango233, for adapting IF with xformers memory efficient attention ๐Ÿ’ช;

if's People

Contributors

apolinario avatar estability avatar gugutse avatar ivksu avatar sayakpaul avatar shonenkov avatar williamberman avatar zeroshot-ai avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

if's Issues

protobuf not installed on notebook

On the example notebook you are missing
!pip install protobuf==3.20.1

just add that after the other pip installs and before t5 and it'll work great.
also if you're using a docker image make sure to use:
nvidia/cuda:11.7.1-cudnn8-devel-ubuntu22.04

Please Add Discussions Tab

It would be very nice to have a centralized (GitHub discussions tab for this repo) place to have discussions about getting the code up and running it, without discussions being divided among random subreddits and discord servers.

cuBLAS issue.

I have freshly installed CUDA toolkit 11.8 on both the host, and inside a docker container. Within the container I run "jupyter notebook"

Previously I got the same error with CUDA 11.3

My understanding is that cuBLAS is part of the CUDA toolkit, and therefore should be available.

import os
import torch
os.environ['FORCE_MEM_EFFICIENT_ATTN'] = "1"
import sys
from deepfloyd_if.modules import IFStageI, IFStageII, StableStageIII
from deepfloyd_if.modules.t5 import T5Embedder
from deepfloyd_if.pipelines import dream, style_transfer, super_resolution, inpainting
import torch.nn.functional as F
import random
import torchvision.transforms as T
import numpy as np
import requests
from PIL import Image
import torch
import re
print("Loaded modules")

if_I = IFStageI('IF-I-XL-v1.0', device='cuda:0')
if_II = IFStageII('IF-II-L-v1.0', device='cuda:1')
if_III = StableStageIII('stable-diffusion-x4-upscaler', device='cuda:2')
t5 = T5Embedder(device='cuda:3')

prompt = 'lush garden'
count = 4

result = dream(
t5=t5, if_I=if_I, if_II=if_II, if_III=if_III,
prompt=[prompt]*count,
seed=42,
if_I_kwargs={
"guidance_scale": 7.0,
"sample_timestep_respacing": "smart100",
},
if_II_kwargs={
"guidance_scale": 4.0,
"sample_timestep_respacing": "smart50",
},
)
if_I.show(result['I'], size=3)
if_I.show(result['II'], size=6)
if_I.show(result['III'], size=14)

166 return module._hf_hook.post_forward(module, output)

File ~/.local/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py:530, in T5Attention.forward(self, hidden_states, mask, key_value_states, position_bias, past_key_value, layer_head_mask, query_length, use_cache, output_attentions)
525 value_states = project(
526 hidden_states, self.v, key_value_states, past_key_value[1] if past_key_value is not None else None
527 )
529 # compute scores
--> 530 scores = torch.matmul(
531 query_states, key_states.transpose(3, 2)
532 ) # equivalent of torch.einsum("bnqd,bnkd->bnqk", query_states, key_states), compatible with onnx op>9
534 if position_bias is None:
535 if not self.has_relative_attention_bias:

RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasGemmStridedBatchedExFix(handle, opa, opb, (int)m, (int)n, (int)k, (void*)&falpha, a, CUDA_R_16BF, (int)lda, stridea, b, CUDA_R_16BF, (int)ldb, strideb, (void*)&fbeta, c, CUDA_R_16BF, (int)ldc, stridec, (int)num_batches, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP)

512x512

Hello and thank you for the amazing work you've done on this SOTA text2images. After testing the HF demo I noticed the super-resolution 256 -> 1024 struggle to give good results. Isn't it possible to introduce a middle step like 256 -> 512 -> 1024 instead?

Offload_folder is ignored?

โ”‚ /usr/local/lib/python3.10/dist-packages/accelerate/utils/modeling.py:872 in โ”‚
โ”‚ load_checkpoint_in_model โ”‚
โ”‚ โ”‚
โ”‚ 869 โ”‚ """ โ”‚
โ”‚ 870 โ”‚ tied_params = find_tied_parameters(model) โ”‚
โ”‚ 871 โ”‚ if offload_folder is None and device_map is not None and "disk" in device_map.values โ”‚
โ”‚ โฑ 872 โ”‚ โ”‚ raise ValueError( โ”‚
โ”‚ 873 โ”‚ โ”‚ โ”‚ "At least one of the model submodule will be offloaded to disk, please pass โ”‚
โ”‚ 874 โ”‚ โ”‚ ) โ”‚
โ”‚ 875 โ”‚ elif offload_folder is not None and device_map is not None and "disk" in device_map. โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
ValueError: At least one of the model submodule will be offloaded to disk, please pass along an offload_folder.

When running on colab, modified demo code

I actually have been playing with both XL and M models to see speed vs quality differences with the models.

So I now loaded XL model again during the same session.
I have been flush()ing and del ing the pipes and everything.
Anyway, line giving me errors is:

text_encoder = T5EncoderModel.from_pretrained(
"DeepFloyd/IF-I-XL-v1.0",
subfolder="text_encoder",
device_map="auto",
load_in_8bit=True,
variant="8bit"
)

pipe = IFImg2ImgPipeline.from_pretrained(
"DeepFloyd/IF-I-XL-v1.0",
text_encoder=text_encoder,
unet=None,
device_map="auto"
)
prompt_embeds, negative_embeds = pipe.encode_prompt(prompt)

#free some memory
del pipe
del text_encoder

for image in images:
flush()
pipe = IFImg2ImgPipeline.from_pretrained(
"DeepFloyd/IF-I-XL-v1.0",
text_encoder=None,
variant="fp16",
torch_dtype=torch.float16,
device_map="auto",
offload_folder = '/content/offload' #THIS IS APPARENTLY IGNORED? SHOULD IT BE IGNORED?
)

Can I distribute the stages over multiple GPUs? Like you see below

from deepfloyd_if.modules import IFStageI, IFStageII, StableStageIII
from deepfloyd_if.modules.t5 import T5Embedder

I have 4 Maxwel Titan X with 12GB VRAM each.

if_I = IFStageI('IF-I-XL-v1.0', device='cuda:0')
if_II = IFStageII('IF-II-L-v1.0', device='cuda:1')
if_III = StableStageIII('stable-diffusion-x4-upscaler', device='cuda:2')
t5 = T5Embedder(device="cuda:3")

running the txt2image script returns all sorts of errors

Manjaro Linux, 4090, amd cpu.
I created a deepfloyd env python=3.10, activated it
pip install -U huggingface_hub diffusers transformers safetensors sentencepiece accelerate bitsandbytes torch
started python and got the token from huggingface
created the script file and ran it. got these errors:

Can someone just point me in the right direction?

2023-04-29 17:11:30.330731: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-04-29 17:11:30.466991: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
Traceback (most recent call last):
  File "/home/vhey/miniconda3/envs/deepfloyd/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1146, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "/home/vhey/miniconda3/envs/deepfloyd/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/home/vhey/miniconda3/envs/deepfloyd/lib/python3.10/site-packages/transformers/models/clip/image_processing_clip.py", line 22, in <module>
    from ...image_transforms import (
  File "/home/vhey/miniconda3/envs/deepfloyd/lib/python3.10/site-packages/transformers/image_transforms.py", line 48, in <module>
    import tensorflow as tf
  File "/home/vhey/.local/lib/python3.10/site-packages/tensorflow/__init__.py", line 37, in <module>
    from tensorflow.python.tools import module_util as _module_util
  File "/home/vhey/.local/lib/python3.10/site-packages/tensorflow/python/__init__.py", line 37, in <module>
    from tensorflow.python.eager import context
  File "/home/vhey/.local/lib/python3.10/site-packages/tensorflow/python/eager/context.py", line 27, in <module>
    import six
ModuleNotFoundError: No module named 'six'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/vhey/deepfloyd/txt2img.py", line 1, in <module>
    from diffusers import DiffusionPipeline
  File "/home/vhey/miniconda3/envs/deepfloyd/lib/python3.10/site-packages/diffusers/__init__.py", line 58, in <module>
    from .pipelines import (
  File "/home/vhey/miniconda3/envs/deepfloyd/lib/python3.10/site-packages/diffusers/pipelines/__init__.py", line 45, in <module>
    from .alt_diffusion import AltDiffusionImg2ImgPipeline, AltDiffusionPipeline
  File "/home/vhey/miniconda3/envs/deepfloyd/lib/python3.10/site-packages/diffusers/pipelines/alt_diffusion/__init__.py", line 32, in <module>
    from .pipeline_alt_diffusion import AltDiffusionPipeline
  File "/home/vhey/miniconda3/envs/deepfloyd/lib/python3.10/site-packages/diffusers/pipelines/alt_diffusion/pipeline_alt_diffusion.py", line 20, in <module>
    from transformers import CLIPImageProcessor, XLMRobertaTokenizer
  File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
  File "/home/vhey/miniconda3/envs/deepfloyd/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1137, in __getattr__
    value = getattr(module, name)
  File "/home/vhey/miniconda3/envs/deepfloyd/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1136, in __getattr__
    module = self._get_module(self._class_to_module[name])
  File "/home/vhey/miniconda3/envs/deepfloyd/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1148, in _get_module
    raise RuntimeError(
RuntimeError: Failed to import transformers.models.clip.image_processing_clip because of the following error (look up to see its traceback):
No module named 'six'

Error when running through examples: "When passing variant='fp16' upgrade `transformers` to at least 4.27.0.dev0"

Running through one of the examples, and finding the following error related to the transformer version:

Traceback (most recent call last):
  File "test3.py", line 9, in <module>
    stage_1 = DiffusionPipeline.from_pretrained("DeepFloyd/IF-I-M-v1.0", variant="fp16", torch_dtype=torch.float16)
  File "${HOME}/miniconda3/envs/deepfloyd/lib/python3.8/site-packages/diffusers/pipelines/pipeline_utils.py", line 1039, in from_pretrained
    loaded_sub_model = load_sub_model(
  File "${HOME}/miniconda3/envs/deepfloyd/lib/python3.8/site-packages/diffusers/pipelines/pipeline_utils.py", line 431, in load_sub_model
    raise ImportError(
ImportError: When passing `variant='fp16'`, please make sure to upgrade your `transformers` version to at least 4.27.0.dev0

If appears that 4.25.1 is the version installed when using the requirements.txt file and following the README instructions.

I'm currently rerunning now (after removing 4.25.1 and installing transformers 4.28.1), however would 4.28.1 be compatible or would we need to keep the library under a certain version?

Thanks! : )

Sharing the sample code I've been utilizing to test:

from diffusers import DiffusionPipeline
from diffusers.utils import pt_to_pil
import torch
from huggingface_hub import login

login()

# stage 1
stage_1 = DiffusionPipeline.from_pretrained("DeepFloyd/IF-I-M-v1.0", variant="fp16", torch_dtype=torch.float16)
stage_1.enable_xformers_memory_efficient_attention()  # remove line if torch.__version__ >= 2.0.0
stage_1.enable_model_cpu_offload()

# stage 2
stage_2 = DiffusionPipeline.from_pretrained(
    "DeepFloyd/IF-II-M-v1.0", text_encoder=None, variant="fp16", torch_dtype=torch.float16
)
stage_2.enable_xformers_memory_efficient_attention()  # remove line if torch.__version__ >= 2.0.0
stage_2.enable_model_cpu_offload()

# stage 3
safety_modules = {"feature_extractor": stage_1.feature_extractor, "safety_checker": stage_1.safety_checker, "watermarker": stage_1.watermarker}
stage_3 = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-x4-upscaler", **safety_modules, torch_dtype=torch.float16)
stage_3.enable_xformers_memory_efficient_attention()  # remove line if torch.__version__ >= 2.0.0
stage_3.enable_model_cpu_offload()

How to get output of Zero Shot Image To Image to match input image size?

How can I ensure the output image size of image to image match the input? Going on the example colab code I use this

original_image = Image.open("input.png")

text_encoder = T5EncoderModel.from_pretrained(
    "DeepFloyd/IF-I-XL-v1.0",
    subfolder="text_encoder", 
    device_map="auto", 
    load_in_8bit=True, 
    variant="8bit"
)

pipe = IFImg2ImgPipeline.from_pretrained(
    "DeepFloyd/IF-I-XL-v1.0", 
    text_encoder=text_encoder, 
    unet=None, 
    device_map="auto"
)

prompt = "anime style"

prompt_embeds, negative_embeds = pipe.encode_prompt(prompt)

pipe = IFImg2ImgPipeline.from_pretrained(
    "DeepFloyd/IF-I-XL-v1.0", 
    text_encoder=None, 
    variant="fp16", 
    torch_dtype=torch.float16, 
    device_map="auto"
)

generator = torch.Generator().manual_seed(0)

image = pipe(
    image=original_image,
    prompt_embeds=prompt_embeds,
    negative_prompt_embeds=negative_embeds, 
    output_type="pt",
    generator=generator,
).images

pil_image = pt_to_pil(image)
pil_image[0].save("output.png")

pipe = IFImg2ImgSuperResolutionPipeline.from_pretrained(
    "DeepFloyd/IF-II-L-v1.0", 
    text_encoder=None, 
    variant="fp16", 
    torch_dtype=torch.float16, 
    device_map="auto"
)

image = pipe(
    image=image,
    original_image=original_image,
    prompt_embeds=prompt_embeds,
    negative_prompt_embeds=negative_embeds, 
    generator=generator,
).images

image[0].save("output.png")

Which works, but the output size is always smaller than the input image.
What am I missing?

This is the output for a 550x550 input image.
output

If possible, please give full code examples too. You have a good initial code snippet on the readme for Text to Image, but then the rest of the examples are incomplete. The same sort of full code examples would be very helpful.

Using Latent Upscaler instead of x4-upscaler

Hi, Thanks for releasing awesome model.

In stage 3, right now we are using "stable-diffusion-x4-upscaler". Which has a lot of memory requirement.

Can we use "stabilityai/sd-x2-latent-upscaler"? This has small memory footprint and is faster as well.

Repository description

Please consider filling in repository details here on GitHub including topics.

The top right โš™๏ธ icon.

image

Clarification on license reference to removing content filters?

I'm wonder if this section of the license is supposed to be included? It appears to say that any removal of the content filters is not allowed under any circumstances. If that is the case, then it's only going to trigger conflict with the community immediately after the release of the weights.

2. All persons obtaining a copy or substantial portion of the Software,
a modified version of the Software (or substantial portion thereof), or
a derivative work based upon this Software (or substantial portion thereof)
must not delete, remove, disable, diminish, or circumvent any inference filters or
inference filter mechanisms in the Software, or any portion of the Software that
implements any such filters or filter mechanisms.

https://github.com/deep-floyd/IF/blob/af64403da0ae2667e5d40670f4014de04bd5c523/LICENSE

Fine-tune

How can we fine-tune it on a single subject with some 10-15 photos and instance/class prompts?

Quickstart failing on no distribution found for torch<2.0.0

Running the pip command "pip install deepfloyd_if==1.0.0" on win 10

gives:

ERROR: Could not find a version that satisfies the requirement torch<2.0.0 (from deepfloyd-if) (from versions: 2.0.0)
ERROR: No matching distribution found for torch<2.0.0

Unreadable notebook

When I tried to open and try the notebook (via jupyter notebook) I've got the following error message:

Error loading notebook
Unreadable Notebook: /home/ogem/codes/public/2023/IF/notebooks/pipes-DeepFloyd-IF.ipynb NotJSONError("Notebook does not appear to be JSON: 'version https://git-lfs.github.com/spec...")

Is there a json syntax error? Or maybe there is another way to open and use the notebook?

Kernel crash on loading model in Ubuntu 22.04

Hey, I'm trying to load the model into 24GB VRAM GPU.

This is my code
from diffusers import DiffusionPipeline
from diffusers.utils import pt_to_pil
import torch

stage_1 = DiffusionPipeline.from_pretrained("DeepFloyd/IF-I-XL-v1.0", torch_dtype=torch.float16)
stage_1.enable_xformers_memory_efficient_attention()
stage_1.enable_model_cpu_offload()

The kernel crashes while loading the model into the memory, I tried loading from deepfloyd_if same thing it also crashes while running the following code.
from deepfloyd_if.modules import IFStageI, IFStageII, StableStageIII
from deepfloyd_if.modules.t5 import T5Embedder

device = 'cuda:0'
if_I = IFStageI('IF-I-XL-v1.0', device=device)
if_II = IFStageII('IF-II-L-v1.0', device=device)
if_III = StableStageIII('stable-diffusion-x4-upscaler', device=device)
t5 = T5Embedder(device="cpu")

This is the error shown in the notebook,
Canceled future for execute_request message before replies were done The Kernel crashed while executing code in the the current cell or a previous cell. Please review the code in the cell(s) to identify a possible cause of the failure. Click here for more info. View Jupyter log for further details.

I tracked memory usage it is not passing 14GB mark, how do I resolve it?

finetune

Finetuning code will be released as well? Awesome project btw! Cant wait to train a custom model

vram requirements

the readme lists a minimum of 16GB of vram without the stable-x4 upscaler, 24GB with, however you can run it with the stable-x4 on as little as 6GB of vram using sequential offload on the first stage/text encoder (in fp16) and cpu offload on the second/third stage. you can also run all three stages using cpu offload on 16GB (maybe less). you do need sufficient dram though.

  stage_1 = IFPipeline.from_pretrained(
      "DeepFloyd/IF-I-XL-v1.0",
      variant="fp16",
      torch_dtype=torch.float16,
  )
  stage_2 = IFSuperResolutionPipeline.from_pretrained(
      "DeepFloyd/IF-II-L-v1.0",
      text_encoder=None,
      variant="fp16",
      torch_dtype=torch.float16,
  )
  stage_3 = DiffusionPipeline.from_pretrained(
      "stabilityai/stable-diffusion-x4-upscaler", torch_dtype=torch.float16
  )
#16 GB
stage_1.enable_model_cpu_offload()
stage_2.enable_model_cpu_offload()
stage_3.enable_model_cpu_offload()
#6 GB
stage_1.enable_sequential_cpu_offload()
stage_2.enable_model_cpu_offload()
stage_3.enable_model_cpu_offload()

i tested this on pytorch2.0.0+cu118 with torch.cuda.set_per_process_memory_fraction() to limit the amount of vram torch can use.
the sequential offload significantly slows down the first stage, but that's better than not being able to run it at all

Not Implemented Error: Memory efficient attention with `xformers` is currently not supported when `self.added_kv_proj_dim` is defined

After going through the README instructions, trying the following test script just to get started, however I am consistently receiving an error: NotImplementedError: Memory efficient attention with xformersis currently not supported whenself.added_kv_proj_dim is defined. (full traceback shared after test code section):

Testcode:

from diffusers import DiffusionPipeline
from diffusers.utils import pt_to_pil
import torch

# stage 1
stage_1 = DiffusionPipeline.from_pretrained("DeepFloyd/IF-I-XL-v1.0", variant="fp16", torch_dtype=torch.float16)
stage_1.enable_xformers_memory_efficient_attention()  # remove line if torch.__version__ >= 2.0.0
stage_1.enable_model_cpu_offload()

# stage 2
stage_2 = DiffusionPipeline.from_pretrained(
    "DeepFloyd/IF-II-L-v1.0", text_encoder=None, variant="fp16", torch_dtype=torch.float16
)
stage_2.enable_xformers_memory_efficient_attention()  # remove line if torch.__version__ >= 2.0.0
stage_2.enable_model_cpu_offload()

# stage 3
safety_modules = {"feature_extractor": stage_1.feature_extractor, "safety_checker": stage_1.safety_checker, "watermarker": stage_1.watermarker}
stage_3 = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-x4-upscaler", **safety_modules, torch_dtype=torch.float16)
stage_3.enable_xformers_memory_efficient_attention()  # remove line if torch.__version__ >= 2.0.0
stage_3.enable_model_cpu_offload()

prompt = 'a photo of a kangaroo wearing an orange hoodie and blue sunglasses standing in front of the eiffel tower holding a sign that says "very deep learning"'

# text embeds
prompt_embeds, negative_embeds = stage_1.encode_prompt(prompt)

generator = torch.manual_seed(0)

# stage 1
image = stage_1(prompt_embeds=prompt_embeds, negative_prompt_embeds=negative_embeds, generator=generator, output_type="pt").images
pt_to_pil(image)[0].save("./if_stage_I.png")

# stage 2
image = stage_2(
    image=image, prompt_embeds=prompt_embeds, negative_prompt_embeds=negative_embeds, generator=generator, output_type="pt"
).images
pt_to_pil(image)[0].save("./if_stage_II.png")

# stage 3
image = stage_3(prompt=prompt, image=image, generator=generator, noise_level=100).images
image[0].save("./if_stage_III.png")

Error traceback:

Traceback (most recent call last):
  File "test2.py", line 8, in <module>00%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 8.61G/8.61G [1:20:50<00:00, 2.70MB/s]
    stage_1.enable_xformers_memory_efficient_attention()  # remove line if torch.__version__ >= 2.0.0
  File "${HOME}/miniconda3/envs/deepfloyd/lib/python3.8/site-packages/diffusers/pipelines/pipeline_utils.py", line 1448, in enable_xformers_memory_efficient_attention
    self.set_use_memory_efficient_attention_xformers(True, attention_op)
  File "${HOME}/miniconda3/envs/deepfloyd/lib/python3.8/site-packages/diffusers/pipelines/pipeline_utils.py", line 1474, in set_use_memory_efficient_attention_xformers
    fn_recursive_set_mem_eff(module)
  File "${HOME}/miniconda3/envs/deepfloyd/lib/python3.8/site-packages/diffusers/pipelines/pipeline_utils.py", line 1464, in fn_recursive_set_mem_eff
    module.set_use_memory_efficient_attention_xformers(valid, attention_op)
  File "${HOME}/miniconda3/envs/deepfloyd/lib/python3.8/site-packages/diffusers/models/modeling_utils.py", line 227, in set_use_memory_efficient_attention_xformers
    fn_recursive_set_mem_eff(module)
  File "${HOME}/miniconda3/envs/deepfloyd/lib/python3.8/site-packages/diffusers/models/modeling_utils.py", line 223, in fn_recursive_set_mem_eff
    fn_recursive_set_mem_eff(child)
  File "${HOME}/miniconda3/envs/deepfloyd/lib/python3.8/site-packages/diffusers/models/modeling_utils.py", line 223, in fn_recursive_set_mem_eff
    fn_recursive_set_mem_eff(child)
  File "${HOME}/miniconda3/envs/deepfloyd/lib/python3.8/site-packages/diffusers/models/modeling_utils.py", line 223, in fn_recursive_set_mem_eff
    fn_recursive_set_mem_eff(child)
  File "${HOME}/miniconda3/envs/deepfloyd/lib/python3.8/site-packages/diffusers/models/modeling_utils.py", line 220, in fn_recursive_set_mem_eff
    module.set_use_memory_efficient_attention_xformers(valid, attention_op)
  File "${HOME}/miniconda3/envs/deepfloyd/lib/python3.8/site-packages/diffusers/models/attention_processor.py", line 161, in set_use_memory_efficient_attention_xformers
    raise NotImplementedError(
NotImplementedError: Memory efficient attention with `xformers` is currently not supported when `self.added_kv_proj_dim` is defined.

can not load "stable-diffusion-x4-upscaler"

error info:

from deepfloyd_if.modules.t5 import T5Embedder
device = 'cuda:0'
if_I = IFStageI('IF-I-XL-v1.0', device=device)
D:\AiTools\DeepFloydIF\IF\vnev\lib\site-packages\huggingface_hub\file_download.py:1104: FutureWarning: The force_filename parameter is deprecated as a new caching system, which keeps the filenames as they are on the Hub, is now in place.
warnings.warn(
if_II = IFStageII('IF-II-L-v1.0', device=device)
if_III = StableStageIII('stable-diffusion-x4-upscaler', device=device)
Traceback (most recent call last):
File "", line 1, in
File "D:\AiTools\DeepFloydIF\IF\vnev\lib\site-packages\deepfloyd_if\modules\stage_III_sd_x4.py", line 34, in init
self.model = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch_dtype, token=self.hf_token)
File "D:\AiTools\DeepFloydIF\IF\vnev\lib\site-packages\diffusers\pipelines\pipeline_utils.py", line 884, in from_pretrained
cached_folder = cls.download(
File "D:\AiTools\DeepFloydIF\IF\vnev\lib\site-packages\diffusers\pipelines\pipeline_utils.py", line 1208, in download
config_file = hf_hub_download(
File "D:\AiTools\DeepFloydIF\IF\vnev\lib\site-packages\huggingface_hub\utils_validators.py", line 112, in _inner_fn
validate_repo_id(arg_value)
File "D:\AiTools\DeepFloydIF\IF\vnev\lib\site-packages\huggingface_hub\utils_validators.py", line 166, in validate_repo_id
raise HFValidationError(
huggingface_hub.utils.validators.HFValidationError: Repo id must use alphanumeric chars or '-', '', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: 'stabilityai\stable-diffusion-x4-upscaler'.

UNetModel parameters

Would it be possible to please post the UNetModel(**model_params) so devs can work on integrating/optimizing already just with randomly initialized weights until the actual ones are released?

Would be great to allow testing optimization ideas and things like that but hard without knowing the exact size, and I couldn't find that in the code currently unless I missed it.

Some questions of T5 dtype?

First, thanks for answering my questions.

  1. When training, which dtype of T5.
  2. Does T5 dtype have a significant impact on the results?

Commands

Is there a list of commands somewhere?

Can not get beautiful owl picture following the instruction.

`from diffusers import DiffusionPipeline
from diffusers.utils import pt_to_pil
import torch

stage 1

stage_1 = DiffusionPipeline.from_pretrained("DeepFloyd/IF-I-XL-v1.0", variant="fp16", torch_dtype=torch.float16)
stage_1.enable_xformers_memory_efficient_attention() # remove line if torch.version >= 2.0.0
stage_1.enable_model_cpu_offload()

stage 2

stage_2 = DiffusionPipeline.from_pretrained(
"DeepFloyd/IF-II-L-v1.0", text_encoder=None, variant="fp16", torch_dtype=torch.float16
)
stage_2.enable_xformers_memory_efficient_attention() # remove line if torch.version >= 2.0.0
stage_2.enable_model_cpu_offload()

stage 3

safety_modules = {"feature_extractor": stage_1.feature_extractor, "safety_checker": stage_1.safety_checker, "watermarker": stage_1.watermarker}
stage_3 = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-x4-upscaler", **safety_modules, torch_dtype=torch.float16)
stage_3.enable_xformers_memory_efficient_attention() # remove line if torch.version >= 2.0.0
stage_3.enable_model_cpu_offload()

prompt = 'a photo of a kangaroo wearing an orange hoodie and blue sunglasses standing in front of the eiffel tower holding a sign that says "very deep learning"'

text embeds

prompt_embeds, negative_embeds = stage_1.encode_prompt(prompt)

generator = torch.manual_seed(0)

stage 1

image = stage_1(prompt_embeds=prompt_embeds, negative_prompt_embeds=negative_embeds, generator=generator, output_type="pt").images
pt_to_pil(image)[0].save("./if_stage_I.png")

stage 2

image = stage_2(
image=image, prompt_embeds=prompt_embeds, negative_prompt_embeds=negative_embeds, generator=generator, output_type="pt"
).images
pt_to_pil(image)[0].save("./if_stage_II.png")

stage 3

image = stage_3(prompt=prompt, image=image, generator=generator, noise_level=100).images
image[0].save("./if_stage_III.png")`

I got one picture like this:
if_stage_II

but when I followed this code :
`from deepfloyd_if.modules import IFStageI, IFStageII, StableStageIII
from deepfloyd_if.modules.t5 import T5Embedder

device = 'cuda:0'
if_I = IFStageI('IF-I-XL-v1.0', device=device)
if_II = IFStageII('IF-II-L-v1.0', device=device)
if_III = StableStageIII('stable-diffusion-x4-upscaler', device=device)
t5 = T5Embedder(device="cpu")
from deepfloyd_if.pipelines import dream

prompt = 'ultra close-up color photo portrait of rainbow owl with deer horns in the woods'
count = 4

result = dream(
t5=t5, if_I=if_I, if_II=if_II, if_III=if_III,
prompt=[prompt]*count,
seed=42,
if_I_kwargs={
"guidance_scale": 7.0,
"sample_timestep_respacing": "smart100",
},
if_II_kwargs={
"guidance_scale": 4.0,
"sample_timestep_respacing": "smart50",
},
if_III_kwargs={
"guidance_scale": 9.0,
"noise_level": 20,
"sample_timestep_respacing": "75",
},
)

if_III.show(result['III'], size=14)
`

I just got this:
generated_image_4

Can "open source" software require a third party account and access token

I'm aware that what does or does not constitute "open source" is somewhat contentious, but in my understanding requiring people to sign up for a third party account, consenting to a license through a third party service and using a third party access token to use the supposedly "open" software is pushing the concept of openness past the breaking point.

Deep Floyd are, of course, perfectly in their rights to impose any restrictions and requirements they like, but to then go on and advertise a release as open source for the community credit seems at least a little bit disingenuous.

4x-upscaler deepfloyd-if python module has problems with win paths

In Windows, when running the notebook of the IF-I-XL-v.1.0 model, the following error occurs when trying to download the stable-diffusion-x4-upscaler:
HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: 'stabilityai\stable-diffusion-x4-upscaler'.

A quick fix would be to change line 23 in the file [your-venv-name]\Lib\site-packages\deepfloyd_if\modules to model_id = 'stabilityai/' + self.dir_or_name

Error when running image variation section in Notebook:

ValueError:
Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to
fit
the quantized model. If you want to dispatch the model on the CPU or the disk while keeping
these modules in 32-bit, you need to set load_in_8bit_fp32_cpu_offload=True and pass a
custom
device_map to from_pretrained. Check
https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-
cpu-and-gpu
for more details.

Module PIL has not attribute "Resampling"

So, if I install Pillow>=9.2.0, then I get: Module PIL has not attribute "Resampling"
And then if I downgrade to Pillow==9.0.0 to not get that error, I get deepfloyd-if 1.0.1 requires Pillow>=9.2.0

CUDA out of memory.

However I am using a station with 4 x A100(40G)

if_I = IFStageI('/IF/deepfloyd-if/IF-I-XL-v1.0', device='cuda:0')
if_II = IFStageII('/IF/deepfloyd-if/IF-II-L-v1.0', device='cuda:1')
if_III = StableStageIII('/IF/deepfloyd-if/stable-diffusion-x4-upscaler', device='cuda:2')
t5 = T5Embedder(device="cuda:3")

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.00 GiB (GPU 0; 39.39 GiB total capacity; 29.37 GiB already allocated; 6.90 GiB free; 30.95 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Only work at demo's pic, if I use my picture, it releases a bug , AssertionError:

AssertionError Traceback (most recent call last)
Cell In[24], line 4
1 count = 4
2 prompt = 'a boy'
----> 4 result = style_transfer(
5 t5=t5, if_I=if_I, if_II=if_II, if_III=if_III,
6 support_pil_img=zkc,
7 prompt=[prompt]*count,
8 style_prompt=[
9 f'in style lego',
10 f'in style zombie',
11 f'in style origami',
12 f'in style anime',
13 ],
14 seed=42,
15 if_I_kwargs={
16 "guidance_scale": 10.0,
17 "sample_timestep_respacing": "10,10,10,10,10,0,0,0,0,0",
18 'support_noise_less_qsample_steps': 5,
19 'positive_mixer': 0.8,
20 },
21 if_II_kwargs={
22 "guidance_scale": 4.0,
23 "sample_timestep_respacing": 'smart50',
24 "support_noise_less_qsample_steps": 5,
25 'positive_mixer': 1.0,
26 },
27 )
28 if_I.show(result['III'], 2, 14)

File ~/miniconda3/envs/if/lib/python3.10/site-packages/deepfloyd_if/pipelines/style_transfer.py:91, in style_transfer(t5, if_I, if_II, if_III, support_pil_img, style_prompt, prompt, negative_prompt, seed, if_I_kwargs, if_II_kwargs, if_III_kwargs, progress, return_tensors, disable_watermark)
87 if_II_kwargs['progress'] = progress
89 if_II_kwargs['support_noise'] = mid_res
---> 91 stageII_generations, _meta = if_II.embeddings_to_image(**if_II_kwargs)
92 pil_images_II = if_II.to_images(stageII_generations, disable_watermark=disable_watermark)
94 result['II'] = pil_images_II

File ~/miniconda3/envs/if/lib/python3.10/site-packages/deepfloyd_if/modules/stage_II.py:26, in IFStageII.embeddings_to_image(self, low_res, t5_embs, style_t5_embs, positive_t5_embs, negative_t5_embs, batch_repeat, aug_level, dynamic_thresholding_p, dynamic_thresholding_c, sample_loop, sample_timestep_respacing, guidance_scale, img_scale, positive_mixer, progress, seed, sample_fn, **kwargs)
21 def embeddings_to_image(
22 self, low_res, t5_embs, style_t5_embs=None, positive_t5_embs=None, negative_t5_embs=None, batch_repeat=1,
23 aug_level=0.25, dynamic_thresholding_p=0.95, dynamic_thresholding_c=1.0, sample_loop='ddpm',
24 sample_timestep_respacing='smart50', guidance_scale=4.0, img_scale=4.0, positive_mixer=0.5,
25 progress=True, seed=None, sample_fn=None, **kwargs):
---> 26 return super().embeddings_to_image(
27 t5_embs=t5_embs,
28 low_res=low_res,
29 style_t5_embs=style_t5_embs,
30 positive_t5_embs=positive_t5_embs,
31 negative_t5_embs=negative_t5_embs,
32 batch_repeat=batch_repeat,
33 aug_level=aug_level,
34 dynamic_thresholding_p=dynamic_thresholding_p,
35 dynamic_thresholding_c=dynamic_thresholding_c,
36 sample_loop=sample_loop,
37 sample_timestep_respacing=sample_timestep_respacing,
38 guidance_scale=guidance_scale,
39 positive_mixer=positive_mixer,
40 img_size=256,
41 img_scale=img_scale,
42 progress=progress,
43 seed=seed,
44 sample_fn=sample_fn,
45 **kwargs
46 )

File ~/miniconda3/envs/if/lib/python3.10/site-packages/deepfloyd_if/modules/base.py:181, in IFBaseModule.embeddings_to_image(self, t5_embs, low_res, style_t5_embs, positive_t5_embs, negative_t5_embs, batch_repeat, dynamic_thresholding_p, sample_loop, sample_timestep_respacing, dynamic_thresholding_c, guidance_scale, aug_level, positive_mixer, blur_sigma, img_size, img_scale, aspect_ratio, progress, seed, sample_fn, support_noise, support_noise_less_qsample_steps, inpainting_mask, **kwargs)
179 else:
180 assert support_noise_less_qsample_steps < len(diffusion.timestep_map) - 1
--> 181 assert support_noise.shape == (1, 3, image_h, image_w)
182 q_sample_steps = torch.tensor([int(len(diffusion.timestep_map) - 1 - support_noise_less_qsample_steps)])
183 support_noise = support_noise.cpu()

Flan-T5

Can we use FLAN-T5 as a language model?
Those FLAN models can represent English and other languages significantly better in our tests.
"If you already know T5, FLAN-T5 is just better at everything. For the same number of parameters, these models have been fine-tuned on more than 1000 additional tasks covering also more languages."

Issue with inpainting

Hi!
I;ve tried to launch the inpainting example from the internal notebook and got error.
`
----> 1 result = inpainting(
2 t5=t5, if_I=if_I,
3 if_II=if_II,
4 if_III=if_III,
5 support_pil_img=raw_pil_image.resize((128, 128), resample=Image.BICUBIC),
6 inpainting_mask=inpainting_mask,
7 prompt=[
8 'blue sunglasses',
9 ],
10 seed=42,
11 if_I_kwargs={
12 "guidance_scale": 7.0,
13 "sample_timestep_respacing": "10,10,10,10,10,0,0,0,0,0",
14 'support_noise_less_qsample_steps': 0,
15 },
16 if_II_kwargs={
17 "guidance_scale": 4.0,
18 'aug_level': 0.0,
19 "sample_timestep_respacing": '100',
20 },
21 )
22 if_I.show(result['I'], 2, 3)
23 if_I.show(result['II'], 2, 6)

File ~/miniconda3/envs/df/lib/python3.8/site-packages/deepfloyd_if/pipelines/inpainting.py:61, in inpainting(t5, if_I, if_II, if_III, support_pil_img, prompt, inpainting_mask, negative_prompt, seed, if_I_kwargs, if_II_kwargs, if_III_kwargs, progress, return_tensors, disable_watermark)
57 if_I_kwargs['negative_t5_embs'] = negative_t5_embs
59 if_I_kwargs['support_noise'] = low_res
---> 61 inpainting_mask_I = img_as_bool(resize(inpainting_mask[0].cpu(), (3, image_h, image_w)))
62 inpainting_mask_I = torch.from_numpy(inpainting_mask_I).unsqueeze(0).to(if_I.device)
64 if_I_kwargs['inpainting_mask'] = inpainting_mask_I

File ~/miniconda3/envs/df/lib/python3.8/site-packages/skimage/transform/_warps.py:154, in resize(image, output_shape, order, mode, cval, clip, preserve_range, anti_aliasing, anti_aliasing_sigma)
149 image = image.astype(np.float32)
151 if anti_aliasing is None:
152 anti_aliasing = (
153 not input_type == bool and
--> 154 not (np.issubdtype(input_type, np.integer) and order == 0) and
155 any(x < y for x, y in zip(output_shape, input_shape)))
157 if input_type == bool and anti_aliasing:
158 raise ValueError("anti_aliasing must be False for boolean images")

File ~/miniconda3/envs/df/lib/python3.8/site-packages/numpy/core/numerictypes.py:416, in issubdtype(arg1, arg2)
358 r"""
359 Returns True if first argument is a typecode lower/equal in type hierarchy.
360
(...)
413
414 """
415 if not issubclass_(arg1, generic):
--> 416 arg1 = dtype(arg1).type
417 if not issubclass_(arg2, generic):
418 arg2 = dtype(arg2).type

TypeError: Cannot interpret 'torch.float32' as a data type
`

libs:
image

I assume something wrong with scikit-image, not sure what
Please, assist.
Thanks!

Installation instructions are not working on Windows (11)

Tried to follow the instructions, yielded in a total disaster. Each pip pack wants to install its own torch version, and I couldn't get anything to work. Followed the instructions 1:1 multiple times in a few diff fresh envs, to no avail.

Also tried with a fresh new PT2 venv, also to no avail.

Could you please re-test your instructions, on windows preferably? I have an RTX 4090 with 24gb of vram, and I couldn't even get to the loading into vram part.

Faster sampling by DPM-Solver++

Congrats! Super great work!

I've noticed that you're currently using the original DDPM scheduler, which is rather slow. It would be much faster if we could apply DPM-Solver++ into this work to accelerate the sampling.

Note that the original DPM-Solver++ may have numerical issues when using the cosine beta schedule, and I've added a fix here: https://github.com/LuChengTHU/dpm-solver/blob/5c6ee9f1e6b60c8c54f955fbaab0a6717fc2b75b/dpm_solver_pytorch.py#L105

I'm happy to help to integrate DPM-Solver++ into IF when the model is released :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.