Describe the bug DeepFloyd's upstream code supports 8px-aligned in

hmm so 86 isn't divisible by 8. if i adjust the like so:

deepfloyd stage 2 crashes with tensor size mismatch when input image size is not divisible by 8 about diffusers HOT 2 OPEN

bghira commented on June 3, 2024

deepfloyd stage 2 crashes with tensor size mismatch when input image size is not divisible by 8

from diffusers.

Comments (2)

bghira commented on June 3, 2024

hmm so 86 isn't divisible by 8.

if i adjust the script like so:

from diffusers import DiffusionPipeline, IFSuperResolutionPipeline
import torch
from PIL import Image
import numpy as np

torch.manual_seed(42)

# Configuration for initial image and desired output
initial_width = 86  # Adjusted width to be one-fourth of 344 (approximately)
initial_height = 64  # Adjusted height to be one-fourth of 256

# Adjust initial_width to be divisible by 8
initial_width = int(np.ceil(initial_width / 8) * 8)
print(f"Resolution: {initial_width}x{initial_height}")
# Initialize your device setting based on availability
torch_device = "cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "xpu" if torch.xpu.is_available() else "cpu"

# Create a dummy image (86x64)
dummy_image = torch.rand((3, initial_height, initial_width), dtype=torch.float32)  # Random noise image
dummy_image = (dummy_image * 255).to(torch.uint8)  # Convert to 8-bit format
dummy_pil_image = Image.fromarray(dummy_image.numpy().transpose(1, 2, 0))  # Convert to PIL image for compatibility
dummy_pil_image.save("dummy_input.png")  # Save the initial dummy image

# Load your stage 2 pipeline
print(f"Image resolution: {dummy_pil_image.size}")
stage2_pipe = IFSuperResolutionPipeline.from_pretrained("DeepFloyd/IF-II-M-v1.0", watermarker=None, safety_checker=None, local_files_only=False).to(device=torch_device, dtype=torch.bfloat16)

# Upscale the dummy image using stage 2 of the pipeline
upscaled_image = stage2_pipe(
    prompt="A simple upscaled image", 
    image=dummy_pil_image, 
    guidance_scale=5.5, 
    num_inference_steps=20, 
    width=initial_width * 4, 
    height=initial_height * 4
).images[0]

upscaled_image.save("upscaled_dummy_output.png")

there is no crash

from diffusers.

bghira commented on June 3, 2024

note: i understand deepfloyd is not often used by commercial outfits due to its restrictive license, but it apparently has research value and i've run into this during research into deepfloyd's characteristics with the T5 text encoder (which is worthwhile to explore, now that there are more models available to compare against). this PR is an effort to improve the experience for research use of these weights.

from diffusers.

Recommend Projects