GithubHelp home page GithubHelp logo

huggingface / diffusers Goto Github PK

View Code? Open in Web Editor NEW
22.6K 185.0 4.7K 41.7 MB

๐Ÿค— Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.

Home Page: https://huggingface.co/docs/diffusers

License: Apache License 2.0

Python 99.90% Makefile 0.02% Dockerfile 0.08%
deep-learning diffusion image-generation pytorch score-based-generative-modeling image2image text2image stable-diffusion stable-diffusion-diffusers hacktoberfest

diffusers's Introduction



GitHub GitHub release GitHub release Contributor Covenant X account

๐Ÿค— Diffusers is the go-to library for state-of-the-art pretrained diffusion models for generating images, audio, and even 3D structures of molecules. Whether you're looking for a simple inference solution or training your own diffusion models, ๐Ÿค— Diffusers is a modular toolbox that supports both. Our library is designed with a focus on usability over performance, simple over easy, and customizability over abstractions.

๐Ÿค— Diffusers offers three core components:

  • State-of-the-art diffusion pipelines that can be run in inference with just a few lines of code.
  • Interchangeable noise schedulers for different diffusion speeds and output quality.
  • Pretrained models that can be used as building blocks, and combined with schedulers, for creating your own end-to-end diffusion systems.

Installation

We recommend installing ๐Ÿค— Diffusers in a virtual environment from PyPI or Conda. For more details about installing PyTorch and Flax, please refer to their official documentation.

PyTorch

With pip (official package):

pip install --upgrade diffusers[torch]

With conda (maintained by the community):

conda install -c conda-forge diffusers

Flax

With pip (official package):

pip install --upgrade diffusers[flax]

Apple Silicon (M1/M2) support

Please refer to the How to use Stable Diffusion in Apple Silicon guide.

Quickstart

Generating outputs is super easy with ๐Ÿค— Diffusers. To generate an image from text, use the from_pretrained method to load any pretrained diffusion model (browse the Hub for 22000+ checkpoints):

from diffusers import DiffusionPipeline
import torch

pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
pipeline.to("cuda")
pipeline("An image of a squirrel in Picasso style").images[0]

You can also dig into the models and schedulers toolbox to build your own diffusion system:

from diffusers import DDPMScheduler, UNet2DModel
from PIL import Image
import torch

scheduler = DDPMScheduler.from_pretrained("google/ddpm-cat-256")
model = UNet2DModel.from_pretrained("google/ddpm-cat-256").to("cuda")
scheduler.set_timesteps(50)

sample_size = model.config.sample_size
noise = torch.randn((1, 3, sample_size, sample_size), device="cuda")
input = noise

for t in scheduler.timesteps:
    with torch.no_grad():
        noisy_residual = model(input, t).sample
        prev_noisy_sample = scheduler.step(noisy_residual, t, input).prev_sample
        input = prev_noisy_sample

image = (input / 2 + 0.5).clamp(0, 1)
image = image.cpu().permute(0, 2, 3, 1).numpy()[0]
image = Image.fromarray((image * 255).round().astype("uint8"))
image

Check out the Quickstart to launch your diffusion journey today!

How to navigate the documentation

Documentation What can I learn?
Tutorial A basic crash course for learning how to use the library's most important features like using models and schedulers to build your own diffusion system, and training your own diffusion model.
Loading Guides for how to load and configure all the components (pipelines, models, and schedulers) of the library, as well as how to use different schedulers.
Pipelines for inference Guides for how to use pipelines for different inference tasks, batched generation, controlling generated outputs and randomness, and how to contribute a pipeline to the library.
Optimization Guides for how to optimize your diffusion model to run faster and consume less memory.
Training Guides for how to train a diffusion model for different tasks with different training techniques.

Contribution

We โค๏ธ contributions from the open-source community! If you want to contribute to this library, please check out our Contribution guide. You can look out for issues you'd like to tackle to contribute to the library.

Also, say ๐Ÿ‘‹ in our public Discord channel Join us on Discord. We discuss the hottest trends about diffusion models, help each other with contributions, personal projects or just hang out โ˜•.

Popular Tasks & Pipelines

Task Pipeline ๐Ÿค— Hub
Unconditional Image Generation DDPM google/ddpm-ema-church-256
Text-to-Image Stable Diffusion Text-to-Image runwayml/stable-diffusion-v1-5
Text-to-Image unCLIP kakaobrain/karlo-v1-alpha
Text-to-Image DeepFloyd IF DeepFloyd/IF-I-XL-v1.0
Text-to-Image Kandinsky kandinsky-community/kandinsky-2-2-decoder
Text-guided Image-to-Image ControlNet lllyasviel/sd-controlnet-canny
Text-guided Image-to-Image InstructPix2Pix timbrooks/instruct-pix2pix
Text-guided Image-to-Image Stable Diffusion Image-to-Image runwayml/stable-diffusion-v1-5
Text-guided Image Inpainting Stable Diffusion Inpainting runwayml/stable-diffusion-inpainting
Image Variation Stable Diffusion Image Variation lambdalabs/sd-image-variations-diffusers
Super Resolution Stable Diffusion Upscale stabilityai/stable-diffusion-x4-upscaler
Super Resolution Stable Diffusion Latent Upscale stabilityai/sd-x2-latent-upscaler

Popular libraries using ๐Ÿงจ Diffusers

Thank you for using us โค๏ธ.

Credits

This library concretizes previous work by many different authors and would not have been possible without their great research and implementations. We'd like to thank, in particular, the following implementations which have helped us in our development and without which the API could not have been as polished today:

  • @CompVis' latent diffusion models library, available here
  • @hojonathanho original DDPM implementation, available here as well as the extremely useful translation into PyTorch by @pesser, available here
  • @ermongroup's DDIM implementation, available here
  • @yang-song's Score-VE and Score-VP implementations, available here

We also want to thank @heejkoo for the very helpful overview of papers, code and resources on diffusion models, available here as well as @crowsonkb and @rromb for useful discussions and insights.

Citation

@misc{von-platen-etal-2022-diffusers,
  author = {Patrick von Platen and Suraj Patil and Anton Lozhkov and Pedro Cuenca and Nathan Lambert and Kashif Rasul and Mishig Davaadorj and Dhruv Nair and Sayak Paul and William Berman and Yiyi Xu and Steven Liu and Thomas Wolf},
  title = {Diffusers: State-of-the-art diffusion models},
  year = {2022},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/huggingface/diffusers}}
}

diffusers's People

Contributors

a-r-r-o-w avatar anton-l avatar apolinario avatar bghira avatar daspartho avatar dg845 avatar dn6 avatar duongna21 avatar fabiorigano avatar haofanwang avatar kadirnar avatar kashif avatar linoytsaban avatar mishig25 avatar okotaku avatar osanseviero avatar patil-suraj avatar patrickvonplaten avatar pcuenca avatar rootonchair avatar sanchit-gandhi avatar sayakpaul avatar shirayu avatar standardai avatar stevhliu avatar umerha avatar wauplin avatar williamberman avatar yiyixuxu avatar younesbelkada avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

diffusers's Issues

1D Upsampler / Downsample

We should also write tests for 1D Upsampler and 1D downsampler layers (for unet_rl.py)

Once all layers are nicely consolidated and we have finished the first round of refactoring, up- and downsampling layers should probably be split into layers that have learnable parameters and those that don't.

cc @patil-suraj - let's discuss offline if not clear

raise the problem No module named 'exceptions' When train the model

Traceback (most recent call last):
File "scripts/train.py", line 1, in
import diffuser.utils as utils
File "/mnt/d/RL/UCSC/diffuser/diffuser/utils/init.py", line 4, in
from .setup import *
File "/mnt/d/RL/UCSC/diffuser/diffuser/utils/setup.py", line 6, in
from tap import Tap
File "/home/lyn/anaconda3/envs/diffuser/lib/python3.8/site-packages/tap.py", line 6, in
from mc_bin_client import mc_bin_client, memcacheConstants as Constants
File "/home/lyn/anaconda3/envs/diffuser/lib/python3.8/site-packages/mc_bin_client/mc_bin_client.py", line 11, in
import exceptions
ModuleNotFoundError: No module named 'exceptions'

Reading model from a local file

Greetings.

Consider I have a model.ckpt file, and it's trained and saved on my local computer. It's not hosted on huggingface and let's consider it's a little bit confidential and cannot be hosted anywhere. How can I feed it into from_pretrained() function?

example script doesn't work with pip install

The pip package doesn't include any of these:

from diffusers.modeling_utils import unwrap_model
from diffusers.optimization import get_scheduler
from diffusers.utils import logging

so "train_unconditional.py" fails:

python3 train_unconditional.py --dataset="cifar10" --resolution=64 --output_path="cifar10-ddpm" --batch_size=16 --num_epochs=100 --gradient_accumulation_steps=1 --lr=1e-4 --warmup_steps=500 --mixed_precision=no

Traceback (most recent call last): File "train_unconditional.py", line 12, in <module> from diffusers.modeling_utils import unwrap_model ImportError: cannot import name 'unwrap_model'

In which config (or in None) should the `predict_epsilon` config be saved?

We're discussing where certain properties should be saved in this repository. For example, this predict_epsilon property is used to distinguish if a model is trained to directly reconstruct the original sample or the noise associated with it.

(used in this colab)

There are multiple arguments for where it could fall, and it is an interesting discussion when creating a library that is modular and useful at different levels of abstraction. Here are the different places the predict_epsilon property could be stored

  • model config: here the user could load a pre-trained model and instantly know to use it. Though, there is an argument to be made that the best model classes will be data-agnostic and this is related to the training structure.
  • pipeline config: here the config will automatically handle the sampling, likely without the user knowing, and it will happen naturally. This is likely best if models are going to be used primarily for inference
  • scheduler config: this is the last natural option and operates at a sort of conceptual middle ground. The scheduler is where is sampling actually occurs, so maybe that is the best spot for the config.

Important context is that the library is currently heavily used for inference, but hopefully will be used for training as well. Some of the positioning of things like predict_epsilon will be more logical in a training script rather than inference.

Missing `transformers` requirement

Dependencies don't mention transformers

diffusers/setup.py

Lines 78 to 91 in 850d434

_deps = [
"Pillow",
"black~=22.0,>=22.3",
"filelock",
"flake8>=3.8.3",
"huggingface-hub",
"isort>=5.5.4",
"numpy",
"pytest",
"regex!=2019.12.17",
"requests",
"torch>=1.4",
"torchvision",
]

However, importing from the quickstart fails without transformers installed:

In [1]: import torch

In [2]: from diffusers import UNetModel, DDPMScheduler
Einops is not installed
Einops is not installed
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Input In [2], in <cell line: 1>()
----> 1 from diffusers import UNetModel, DDPMScheduler

File ~/Workspaces/Python/diffusers/src/diffusers/__init__.py:13, in <module>
     11 from .models.unet_grad_tts import UNetGradTTSModel
     12 from .pipeline_utils import DiffusionPipeline
---> 13 from .pipelines import DDIM, DDPM, GLIDE, LatentDiffusion, PNDM, BDDM
     14 from .schedulers import DDIMScheduler, DDPMScheduler, SchedulerMixin, PNDMScheduler
     15 from .schedulers.classifier_free_guidance import ClassifierFreeGuidanceScheduler

File ~/Workspaces/Python/diffusers/src/diffusers/pipelines/__init__.py:4, in <module>
      2 from .pipeline_ddpm import DDPM
      3 from .pipeline_pndm import PNDM
----> 4 from .pipeline_glide import GLIDE
      5 from .pipeline_latent_diffusion import LatentDiffusion
      6 from .pipeline_bddm import BDDM

File ~/Workspaces/Python/diffusers/src/diffusers/pipelines/pipeline_glide.py:27, in <module>
     24 from torch import nn
     26 import tqdm
---> 27 from transformers import CLIPConfig, CLIPModel, CLIPTextConfig, CLIPVisionConfig, GPT2Tokenizer
     28 from transformers.activations import ACT2FN
     29 from transformers.modeling_outputs import BaseModelOutput, BaseModelOutputWithPooling

ModuleNotFoundError: No module named 'transformers'

Is the end goal to have transformers as a hard or soft dependency?

Wishlist: example/documentation for fine-tuning or continuing training of a model

This is more of a newby issue, but I would find it helpful if there was an example showing how to continue training from an existing checkpoint and/or start training a new model from an existing pre-trained model. I guess it is straightforward by loading the state dict of the old model, but maybe there are certain tricks or techniques that involve the scheduler settings or other issues to watch out for.

Launch training with accelerate launch arg error

Launching training with accelerate launch raise argument errors

Traceback (most recent call last):
  File "/usr/local/bin/accelerate-launch", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 690, in main
    launch_command(args)
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 684, in launch_command
    simple_launcher(args)
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 280, in simple_launcher
    mixed_precision = PrecisionType(args.mixed_precision.lower())
AttributeError: 'NoneType' object has no attribute 'lower'

However, you can still start the training by calling the script with python

python train_unconditional.py ...

Friction log from new user

Hey all! I'm writing a friction log as a new user and someone with 0 experience with diffusion models. I hope this is useful.

Overall, I loved that the README had a definition section, it was very quick to get things going and I was able to have some result with existing pipelines very quickly. As a first step I just went through the README, I'll kick off some training tomorrow. Awesome work everyone

1. Going through the README

I1. (first code example)
When I see https://huggingface.co/fusing/ddpm-lsun-church, the repo shows as transformers. Two questions arise

  • Do we want a new tag for diffusers?
  • What about adding a code snippet? DiffusionPipeline.from_pretrained(model_id)
    (both should be easy and happy to help with it)

I2 (first code example)
I tried the code snippet suggested at https://huggingface.co/fusing/ddpm-lsun-church, but it fails

model_id = "fusing/ddpm-lsun-church"
ddpm = DiffusionPipeline.from_pretrained(model_id)

with


AttributeError: module 'diffusers' has no attribute 'GaussianDDPMScheduler'

(same for second exampke)

This also makes me wonder what's the difference between just using the diffusion pipeline directly as in the model card vs using DDPMScheduler + UNetModel approach as in the README. Is the pipeline approach just a wrapper of both?

I3 (second code example)

Should num_inference_steps be len(noise_scheduler) as in first example?

I4
I just quickly browsed through the RL guide, but it seems the pretrained model is downloaded from Dropbox ๐Ÿ˜… should we add it to the Hub?

I5
The short examples in section 2 are ๐Ÿ”ฅ. They are short, clear, concise and nice results!

I6
It would be nice to have metadata in the model repos to be able to filter models depending what they can do (tts, text-to-image, etc).

Latent diffusion model

Hi,
Is there a way to make the latent diffusion model able to get the context from any modality (that would be user defined) with a training script associated to this ? (modality as features, text or basically encoder that outpus some latent vector of an other modality )
Let me know if its not clear :)
Thank you !

glide training generates weird images

Hi,

Thanks for this amazing repository! I am currently training Glide using the script train_glide_text_to_image.py. However, the generated images do not make sense and no matter the text prompt, the generated images are all similar to each other.

Below, for the text "a woman walks her dog on the beach"
Screen Shot 2022-06-30 at 4 21 59 PM

and for "puppy in a christmas present"
Screen Shot 2022-06-30 at 4 22 28 PM

I have been training for ~40 epochs on the default dog captions dataset. Besides, the pretrained model fusing/glide-base is able to produce quite good results. for example, for "puppy in a christmas present"
Screen Shot 2022-06-30 at 4 23 32 PM

Does anyone know what could go wrong during training?

Thanks!

Image inpainting

Hi,

2 quick questions around this:

  • Is there any colab / guiding doc around leveraging this model for image inpainting?
  • Given a source person + t-shirt image, how can i use a guided text prompt (i.e. "show person wearing this t-shirt") to generate an image of the same?

Common API

How much of a common API do you expect there to be across different objects? For example, the unet.py file contains a Trainer, while other models seem to only rely on their forward method, but they have different APIs and signatures.

Same with pipelines, for example with the BDDM pipeline having the following arguments for it's __call__ method: mel_spectrogram, generator, torch_device=None, while the DDIM pipeline has batch_size=1, generator=None, torch_device=None, eta=0.0, num_inference_steps=50.

Do we expect all of these models, pipelines, schedulers to have a common API at the end? Is that even possible with diffusers?

It seems like most arguments are similar, but with a few specificities for models, pipelines and schedulers. That's where having a configuration system would arguably work quite well as it would show very visibly what each of them has in terms of arguments for customization.

This reminds me a bit of the do_lower_case problem we have in transformers: some tokenizers have it, some don't, but users don't necessarily understand that and try to use it for all tokenizers.

Support batches in noise scheduler

Firstly, kudos to the HF team for building this awesome library!

I would like to propose some modifications to the DDPM noise scheduler (and potentially other schedulers) to support batching. Based on training_ddpm.py, it appears that the DDPM noise scheduler does not batch forward diffusion steps:

for idx in range(bsz):
noise = torch.randn(clean_images.shape[1:]).to(clean_images.device)
noise_samples[idx] = noise
noisy_images[idx] = noise_scheduler.forward_step(clean_images[idx], noise, timesteps[idx])

Supporting batches should be possible in theory by crudely modifying DDPMScheduler.get_alpha_prod to something like

def get_alpha_prod(self, time_steps):
        mask = (time_steps < 0).float().numpy()
        return torch.from_numpy(self.one * mask + self.alphas_cumprod[time_steps] * (1 - mask))

The referenced line in the training example script would now be reduced to

#  noises = torch.randn(clean_images.shape).to(clean_images.device)
# timesteps = torch.randint(0, noise_scheduler.timesteps, (16, 1, 1, 1)).long()
noise_images = noise_scheduler.forward_step(clean_images, noises, timesteps)

Is there a particular reason why we should avoid batching the forward process? Curious to hear your thoughts!

Schedulers - what code should go into a "Scheduler" class?

At the moment, the scheduler class has very little logic -> see:

class GaussianDDPMScheduler(nn.Module, ConfigMixin):

where as the example of the unrolled denoising process is getting quite complicated (copied from the README):

# 3. Denoise                                                                                                                                           
for t in reversed(range(len(scheduler))):
	# 1. predict noise residual
	with torch.no_grad():
		pred_noise_t = self.unet(image, t)

	# 2. compute alphas, betas
	alpha_prod_t = self.noise_scheduler.get_alpha_prod(t)
	alpha_prod_t_prev = self.noise_scheduler.get_alpha_prod(t - 1)
	beta_prod_t = 1 - alpha_prod_t
	beta_prod_t_prev = 1 - alpha_prod_t_prev

	# 3. compute predicted image from residual
	# First: compute predicted original image from predicted noise also called
	# "predicted x_0" of formula (15) from https://arxiv.org/pdf/2006.11239.pdf
	pred_original_image = (image - beta_prod_t.sqrt() * pred_noise_t) / alpha_prod_t.sqrt()

	# Second: Clip "predicted x_0"
	pred_original_image = torch.clamp(pred_original_image, -1, 1)

	# Third: Compute coefficients for pred_original_image x_0 and current image x_t
	# See formula (7) from https://arxiv.org/pdf/2006.11239.pdf
	pred_original_image_coeff = (alpha_prod_t_prev.sqrt() * self.noise_scheduler.get_beta(t)) / beta_prod_t
	current_image_coeff = self.noise_scheduler.get_alpha(t).sqrt() * beta_prod_t_prev / beta_prod_t
	# Fourth: Compute predicted previous image ยต_t
	# See formula (7) from https://arxiv.org/pdf/2006.11239.pdf
	pred_prev_image = pred_original_image_coeff * pred_original_image + current_image_coeff * image

	# 5. For t > 0, compute predicted variance ฮฒt (see formala (6) and (7) from https://arxiv.org/pdf/2006.11239.pdf)
	# and sample from it to get previous image
	# x_{t-1} ~ N(pred_prev_image, variance) == add variane to pred_image
	if t > 0:
		variance = (1 - alpha_prod_t_prev) / (1 - alpha_prod_t) * self.noise_scheduler.get_beta(t).sqrt()
		noise = scheduler.sample_noise(image.shape, device=image.device, generator=generator)
		prev_image = pred_prev_image + variance * noise
	else:
		prev_image = pred_prev_image

	# 6. Set current image to prev_image: x_t -> x_t-1
	image = prev_image

As noted by @patil-suraj , I also start to think that we should put more logic into a DDPMNoiseScheduler class since we more or less copy this loop otherwise for all other models such as GLIDE and LDM.

If we give the scheduler class more logic we could reduce the loop to:

for t in reversed(range(len(scheduler))):
	# 1. predict noise residual
	with torch.no_grad():
		pred_noise_t = self.unet(image, t)

	prev_image = scheduler.sample_prev_image(pred_noise_t, image, t)
     
        image = prev_image

I start to be in favor of this reduced for loop. Obviously a user could still do the above, very in-detail loop, but IMO it would be important to give the user a function that can be reused for different models, such as def sample_prev_image

@patil-suraj @anton-l what do you think?

Also would love to hear the opinion of @thomwolf and @srush here :-)

No attribute 'LatentDiffusionPipeline'

Error: AttributeError: module 'diffusers' has no attribute 'LatentDiffusionPipeline'

On:

pipeline = DiffusionPipeline.from_pretrained("fusing/latent-diffusion-text2im-large")

This happened after the model files got successfully downloaded.

Environment:

  • python: 3.10.5
  • os: ubuntu22.04
  • diffusers: 0.0.4

Traceback:

AttributeError                            Traceback (most recent call last)
Input In [12], in <cell line: 1>()
----> 1 DiffusionPipeline.from_pretrained("fusing/latent-diffusion-text2im-large")

File ~/.cache/pypoetry/virtualenvs/learning-diffusers-cZHbQeYU-py3.10/lib/python3.10/site-packages/diffusers/pipeline_utils.py:153, in DiffusionPipeline.from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
    151 else:
    152     diffusers_module = importlib.import_module(cls.__module__.split(".")[0])
--> 153     pipeline_class = getattr(diffusers_module, config_dict["_class_name"])
    155     # (TODO - we should allow to load custom pipelines
    156     # else we need to load the correct module from the Hub
    157     # module = module_candidate
    158     # pipeline_class = get_class_from_dynamic_module(cached_folder, module, class_name_, cached_folder)
    160 init_dict, _ = pipeline_class.extract_init_dict(config_dict, **kwargs)

AttributeError: module 'diffusers' has no attribute 'LatentDiffusionPipeline'

TensorFlow and Flax/Jax

Will this library be similar to transformers and include TensorFlow and Flax implementations at some point?

How to use multiple GPUs?

The example script shows how to use torch distributed to launch across multiple machines.

What about using multiple GPUs? Does huggingface automatically use up all GPUs on the current machine? How do we control how much it uses?

Add options to disable/hide progress bar

Is your feature request related to a problem? Please describe.

This is a minor thing, but I find the progress bar annoying when I run inference with pipeline successively.

See this screenshot for example.

In this case, I generated 10 images using DDIMPipeline and used tqdm myself, but the progress bars coming from __call__ of the pipeline are stacking up and annoying.

Describe the solution you'd like

It would be nice if disable and leave options of tqdm were available with pipelines. Keyword arguments (say, disable_tqdm and leave_tqdm?) could be added to __call__ methods and passed to tqdm. These are the relevant lines in case of DDPMPipeline:

def __call__(self, batch_size=1, generator=None, torch_device=None, output_type="pil"):

for t in tqdm(self.scheduler.timesteps):

I found something that might be relevant in the logging module, but it's not used in the pipeline modules. Maybe this can be used.

Higher level pipeline

Just as in transformers, I was wondering if we have any plans for a general pipeline function that could use the different pipelines under the hood.

Some tests fail on CPU / M1

For example, in debugging the test test_modeling_utils.py::PipelineTesterMixin::test_score_sde_ve_pipeline, we added a specific value for CPU, but the values on m1 are different slightly again. Here are the values from m1:

if model.device.type == "cpu":
    expected_image_sum = 3384805888.0 
    expected_image_mean = 1076.00085 
elif model.device.type == "m1":
    expected_image_sum = 3384805376.0
    expected_image_mean = 1076.0006103515625
else:
    expected_image_sum = 3382849024.0
    expected_image_mean = 1075.3788

This happens because some of the models are very large. Will need to add a best practice for tests & contributing.

The model_card_template is not included in the library

Running the training colab fails when pushing to the Hub with

FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/lib/python3.7/dist-packages/diffusers/utils/model_card_template.md'

Doing

!ls /usr/local/lib/python3.7/dist-packages/diffusers/utils/

gives

dummy_transformers_and_inflect_and_unidecode_objects.py  logging.py
dummy_transformers_objects.py				 __pycache__
__init__.py

Iirc, if you want a markdown file to be included you need to have include_package_data=True in the setup.py file. Actually I see this is actually done ๐Ÿค”

cc @anton-l @nateraw

[Feature request] Tensor to Image post-processing integrated with the library

For the image diffusers, the outputs are tensors that need to be pre-processed to become useful as images. Different models and schedulers may require different post-processing for the images that the user may not be aware about.

For that, the API for pipelines could have a output_type option that the user could choose between the output formats such as pil, numpy, tensor, analogous to the Gradio type on the gr.Image component

This could enable for the following use-case:

image = ldm_pipeline([prompt], generator=generator, num_inference_steps=steps, guidance_scale=guidance_scale, output_type="pil")

Where the output is already a PIL image (if the output is only one image), or a PIL array (if the output are multiple images)

(Also when the library is expanded to more modalities, the final output modality could follow the same logic)

Glide pipeline generates wrong image dimensions

Hi,

Thank you very much for this amazing repository.

I was running the inference code for the glide pipeline, but the output is not of good dimension
Screen Shot 2022-06-29 at 5 54 34 PM

This is the error:

Screen Shot 2022-06-29 at 5 55 21 PM

It seems that the output is still the embeddings. Any suggestions, please?

Thanks!

Collection of nitpicks

See below for a small collection of nitpicks; I suppose those will be adressed before the first release, but wanted to write them down somewhere:

  • Documentation seems to be in a different format so will need to be refactored to work with the doc-builder
  • The logging class still has mentions to ๐Ÿค— Transformers
  • You have some tqdm instances in schedulers, but they're not using the utility defined in logging in order to toggle bars on and off
  • Some copyrights have wrong attribution (e.g., configuration_utils which mentions NVIDIA)
  • In some places you define a global variable for configuratio name (e.g., SCHEDULER_CONFIG_NAME = "scheduler_config.json"), in other places you define them as a string (config_name = "model_index.json", in DiffusionPipeline)
  • We have some conventions in transformers that IMO make sense for this library as well, wondering if it is an oversight or a decision to do differently. Those that I can see are: single letter variables, message-less asserts
  • The register_modules doesn't seem to play well with code completion tools
  • The CLIP model is entirely copy/pasted from transformers, is the goal to upstream the change at some point or to keep it separate in that repo?

Implementation for TensorRT

Hi,

I guess that a reverse diffusion process can be performed fastly if an UNet implementation in this library can infer with TensorRT.
Would you have the plan to implement for TensorRT?

Frozen parameters in GaussianFourierProjection

Hi, just a beginner with diffusion models and have been using your implementations as reference. I have a question about this class

Why is requires_grad set to false in the weight parameter? Won't this mean, during training, the noise level embeddings won't be updated?

Thanks!

Adding Diffusion Planning Model from "Planning with Diffusion for Flexible Behavior Synthesis"

Hi! First off, thanks for creating this awesome library, excited to see where it goes!

I did see in the README that there's a TODO for adding the RL model from https://github.com/jannerm/diffuser, and I recently saw https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unet_rl.py. Is this actively being worked on? If not, I can take a stab at it, as I've been messing around with the original code a bit.

Running difussers with GPU

Running the example codes i see that the CPU and not the GPU is used, is there a way to use GPU instead

Pros and cons of the configuration setup

Could you mention the reasons why you opted for a configuration setup that is different from transformers'?

From a previous conversation I remember it was in order to not repeat twice the arguments, however when looking at schedulers it seems like it is still the case:

def __init__(
self,
timesteps=1000,
beta_start=0.0001,
beta_end=0.02,
beta_schedule="linear",
trained_betas=None,
timestep_values=None,
variance_type="fixed_small",
clip_predicted_image=True,
tensor_format="np",
):
super().__init__()
self.register(
timesteps=timesteps,
beta_start=beta_start,
beta_end=beta_end,
beta_schedule=beta_schedule,
trained_betas=trained_betas,
timestep_values=timestep_values,
variance_type=variance_type,
clip_predicted_image=clip_predicted_image,
)

From a quick read through the file, I don't understand why I have to register something, and what exactly I need to register. Some values are directly set as attribute below the register, while others are passed to the register method, which seems to act as an __init__ method with required arguments.

Is it in order to isolate a dict_to_save that will serialize only the kwargs passed to the register method? Wouldn't it be simpler with an __init__ method in the ConfigMixin instead?

-> Or is it a choice so as to not have a configuration object that you move around everywhere, instead choosing to have it as a mixin for both schedulers and pipelines?

Stable Diffusion v1-3 crashes with TypeError: step() got an unexpected keyword argument 'eta'

Describe the bug

When attempting text-to-image synthesis, Stable Diffusion v1-3 crashes with TypeError: step() got an unexpected keyword argument 'eta'

EDIT: I didn't realize someone started a thread about this same bug in the Community tab of the model's HuggingFace page, though there's no discussion there yet. Sorry if this shouldn't have been opened, please let me know if I should close this, thanks.

Reproduction

!pip install git+https://github.com/huggingface/diffusers.git
!pip install transformers

import torch
from diffusers import LDMTextToImagePipeline

access_token = # insert access token here

pipe = LDMTextToImagePipeline.from_pretrained("CompVis/stable-diffusion-v1-3-diffusers", use_auth_token=access_token)

prompt  = "19th Century wooden engraving of Elon musk" # or any other prompt

seed = torch.manual_seed(1024)
images = pipe([prompt], num_inference_steps=50, guidance_scale=7.5, generator=seed)["sample"]

# save images
for idx, image in enumerate(images):
    image.save(f"image-{idx}.png")

Logs

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-a43d8871a546> in <module>()
      7 
      8 seed = torch.manual_seed(1024)
----> 9 images = pipe([prompt], num_inference_steps=50, guidance_scale=7.5, generator=seed)["sample"]
     10 
     11 # save images

1 frames
/usr/local/lib/python3.7/dist-packages/torch/autograd/grad_mode.py in decorate_context(*args, **kwargs)
     25         def decorate_context(*args, **kwargs):
     26             with self.clone():
---> 27                 return func(*args, **kwargs)
     28         return cast(F, decorate_context)
     29 

/usr/local/lib/python3.7/dist-packages/diffusers/pipelines/latent_diffusion/pipeline_latent_diffusion.py in __call__(self, prompt, batch_size, generator, torch_device, eta, guidance_scale, num_inference_steps, output_type)
     87 
     88             # compute the previous noisy sample x_t -> x_t-1
---> 89             latents = self.scheduler.step(noise_pred, t, latents, **extra_kwrags)["prev_sample"]
     90 
     91         # scale and decode the image latents with vae

TypeError: step() got an unexpected keyword argument 'eta'

System Info

diffusers 0.1.3
transformers 4.21.1
Python 3.7.13

Missing schema: Invalid URL

Throwing MissingSchema: Invalid URL on trying to load a sample DiffusionPipeline from the huggingface hub. It does download the model files but fails right after. Maybe it's a problem with the model?

I tried to run:

pipeline = DiffusionPipeline.from_pretrained("fusing/glide-base")

Error:

Could not locate the glide.py inside /home/diwank/.cache/huggingface/diffusers/models--fusing--glide-base/snapshots/510aa20b54bbb8b77c170812dc0cbbbd06f58a6a.

MissingSchema: Invalid URL '/home/diwank/.cache/huggingface/diffusers/models--fusing--glide-base/snapshots/510aa20b54bbb8b77c170812dc0cbbbd06f58a6a/glide.py': No scheme supplied.
Perhaps you meant http:///home/diwank/.cache/huggingface/diffusers/models--fusing--glide-base/snapshots/510aa20b54bbb8b77c170812dc0cbbbd06f58a6a/glide.py?

Environment:

  • python: 3.10.5
  • os: ubuntu22.04
  • diffusers: 0.0.4

latent_diffusion not found

Hi :)

I was trying to recreate the example of text-to-image in Colab, but I am getting the following error:

OSError: file /root/.cache/huggingface/hub/models--fusing--latent-diffusion-text2im-large/snapshots/d5eab56148ae55791834277fe9ee6d095066f607/latent_diffusion not found

The content of the folder where the model is downloaded contains this:

total 20
drwxr-xr-x 2 root root 4096 Jul  6 15:34 bert
lrwxrwxrwx 1 root root   52 Jul  6 15:34 model_index.json -> ../../blobs/8df70cbb4f11bf11458c702f3905cc37c4f4ffc1
drwxr-xr-x 2 root root 4096 Jul  6 15:34 noise_scheduler
lrwxrwxrwx 1 root root   52 Jul  6 15:33 README.md -> ../../blobs/475faa020db3c2b6143197b19b8b486c2ce16530
drwxr-xr-x 2 root root 4096 Jul  6 15:34 tokenizer
drwxr-xr-x 2 root root 4096 Jul  6 15:35 unet
drwxr-xr-x 2 root root 4096 Jul  6 15:35 vqvae

I am using these dependencies:

ImportError: LDMTextToImagePipeline requires the transformers library but it was not found in your environment.

Hi, Thanks for a fantastic model but when I am installing it on Colab, It throws the below error.

ImportError: LDMTextToImagePipeline requires the transformers library but it was not found in your environment. You can install it with pip.

I have installed it via !pip install diffusers transformers and I am able to import both the libraries successfully.

Can anyone help me fix the issue?

New modalities

One of the stated design decisions from the readme was to support arbitrary modalities, not just images.

I'm in the process of trying to adapt the code for 1D vectors (not H x W x C images).

this line:

noisy_images = noise_scheduler.training_step(clean_inputs, noise_samples, timesteps)

adapted from train_unconditional.py, takes in a (16, 198) clean_inputs tensor and returns a (16, 1, 16, 198) noisy_images tensor. So, 1D tensors are not working out of the box. I am just curious if I am taking the right approach to get 1D tensors to work or there's no avoiding custom coding a new DDPMScheduler etc.

EDIT: I got the DDPM training_step function to work with this change:

        if len(original_samples.shape) == 2:
            timesteps = timesteps.unsqueeze(-1)
        else:
            timesteps = timesteps.reshape(batch_size, 1, 1, 1)

[Feature] Add Langevin Dynamics Sampler

The score-based generative models sometimes us a Langevin Dynamics step to de-noise samples; see NCSN. In the diffusion model for generating molecule conformations, GeoDiff, this step function had the best results (even though it is simpler than the score-based schedulers currently in the repo).

It could be worth adding to the repo, likely after the branch #54 is added.

cc @MinkaiXu

cifar10 quality much worse than DDPM paper

Here are my cifar10 32x32 results, with 7x NVIDIA GeForce RTX 2080 Ti with 11GB VRAM trained for ~10 hours with:

python3 -m torch.distributed.run --nproc_per_node 7 train_unconditional.py --dataset="cifar10" --resolution=32 --output_dir="cifar10-ddpm-" --batch_size=16 --num_epochs=100 --gradient_accumulation_steps=1 --lr=1e-4 --warmup_steps=500

cifar10-ddpm.zip

The quality is worse than DDPM paper, but also according to my fellow researcher it is worse than the lucidbrains repo. Perhaps there is still a bug or missing setting somewhere?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.