fastai / diffusion-nbs Goto Github PK

View Code? Open in Web Editor NEW

576.0 10.0 254.0 163.15 MB

Getting started with diffusion

License: Apache License 2.0

Jupyter Notebook 99.98% Python 0.02%

diffusion-nbs's Introduction

diffusion-nbs

Getting started with diffusion

diffusion-nbs's People

Contributors

Stargazers

Watchers

Forkers

rekil156 kavindu404 vtecftwy kesfrance stpingi agisga jmelsbach maizabros nmoran niyazikemer jegomezg s13tc2 matt-fff mozeal arunoda jantic butchland ksferguson cabbage8897 qbiwan petergoldstein joshiokk vigneshbaskar n-e-w njgroene cbare problemsolversguild leemengtw nyandwi nyzu-dev anatolicvs markb2 raulkite meta-sean 921kiyo fahimf liuyixi520 franckalbinet breezedeus johannesstutz o-keenan mattlichti albertvaka nirantk dmitriyg228 benmfox sirkyven harpreetmann24 nasheqlbrm thliang01 harishanand95 rrustom truthdead devforfu strickvl thangdduong ihanif ahmetgunduz mistobaan akash5474 mohitjuneja techthiyanes sayan0506 sunbc0120 kaitang26 animesh csaroff lkarjun alexredplanet riochuong wesleyegberto miteas cly julicq ikj1992 fmquaglia xl0 jiruijin dacquaviva ferranespigares munisdev86 ab-10 ahagai jszymanskijs calvinfeng jatintyagi-dev deep-oasis-ai centerqi xichenn jgmjgm deeplearn-art yurazharkovskyab akshaypt7 mohsen796 garymihalik1 mistrymm7 aswahd sarahspak kevo1ution briansigafoos

diffusion-nbs's Issues

TypeError: init() got an unexpected keyword argument 'tensor_format'

i can t solve this error.i never found tensor_format in text.what it should be?

noise_scheduler = DDPMScheduler(
    beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", num_train_timesteps=1000, tensor_format="pt")#, tensor_format="pt"

text
DDPMScheduler
num_train_timesteps – number of diffusion steps used to train the model.
beta_start – the starting beta value of inference.
beta_end – the final beta value.
beta_schedule – the beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose from linear, scaled_linear, or squaredcos_cap_v2.
trained_betas – option to pass an array of betas directly to the constructor to bypass beta_start, beta_end etc.
variance_type – options to clip the variance used when adding noise to the denoised sample. Choose from fixed_small, fixed_small_log, fixed_large, fixed_large_log, learned or learned_range.
clip_sample – option to clip predicted sample between -1 and 1 for numerical stability.
prediction_type – prediction type of the scheduler function, one of epsilon (predicting the noise of the diffusion process), sample (directly predicting the noisy sample`) or v_prediction

Broken hyperlinks for stable_diffusion.ipynb

In stable_diffusion.ipynb, under Stable Diffusion Pipeline heading, hyperlinks for StableDiffusionPipeline and diffusion inference pipeline (first and second links in the paragraph) result in 404 pages.

Can you provide new links for them please?

typo in the last code section

it says
for img in image: display(mk_img(img))

and should be
for img in images: display(mk_img(img))

i.e. image -> images

crash

Minor typo in Stable Diffusion Deep Dive notebook

Hi, @johnowhitaker

In the positional embedding section at the line

But now instead of dealing with ~50 tokens we just need one for each position (77 total):

I think the number of tokens should be 50K instead of 50. Could you please confirm this?

Thanks

How to add clip loss in Guidence part for stable diffusion latent ?

Hi, thanks for your great work.

I'm trying to add clip loss for stable diffusion latent in the Guidence part, but when i decode image from latent and put it into clip_model, when cal clip_loss with latent, it occurs an error, RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior. How to fix this problem or is there any example to refer?

Stable Diffusion deep dive notebook can't be run on 8GB GPUs

OOM errors pop up when running the notebook on a 8GB GPU. I managed to run it successfully by using half-precision tensors (fp16) instead.

FileNotFoundError: [Errno 2] No such file or directory: 'learned_embeds.bin'

birb_embed = torch.load('learned_embeds.bin')
birb_embed.keys(), birb_embed[''].shape

Deep Dive NB: Quick Fix for AttributeError: 'CLIPTextTransformer' object has no attribute '_build_causal_attention_mask'

In the Stable Diffusion Deep Dive notebook, in the code plot immediately following the Transformer diagram, there is the definition of get_output_embeds which includes a call to text_encoder.text_model._build_causal_attention_mask:

def get_output_embeds(input_embeddings):
    # CLIP's text model uses causal mask, so we prepare it here:
    bsz, seq_len = input_embeddings.shape[:2]
    causal_attention_mask = text_encoder.text_model._build_causal_attention_mask(bsz, seq_len, dtype=input_embeddings.dtype)
    ...

That is currently generating an error for me when I run the notebook on Colab (from a fresh instance) or my home computer:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
[<ipython-input-33-dbb74b7ec9b4>](https://localhost:8080/#) in <cell line: 26>()
     24     return output
     25 
---> 26 out_embs_test = get_output_embeds(input_embeddings) # Feed through the model with our new function
     27 print(out_embs_test.shape) # Check the output shape
     28 out_embs_test # Inspect the output

1 frames
[/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in __getattr__(self, name)
   1612             if name in modules:
   1613                 return modules[name]
-> 1614         raise AttributeError("'{}' object has no attribute '{}'".format(
   1615             type(self).__name__, name))
   1616 

AttributeError: 'CLIPTextTransformer' object has no attribute '_build_causal_attention_mask'

Everything in the notebook prior to that line runs fine.

Perhaps I'm doing something wrong, or perhaps something has changed with the HF libraries that being used, since the notebook's original conception?

UPDATE:

I see the same issue here: drboog/ProFusion#12. It seems that transformers has changed. Downgrading to version 4.25.1 fixed the problem.

Thus changing the the pip install line at the top of the notebook to

!pip install -q --upgrade transformers==4.25.1 diffusers ftfy

...will restore full functionality.

Feel free to close this issue at your convenience. Perhaps a PR is in order.

Presumably some way to keep up to date with transformers will be preferable, but for now this is a quick fix.

Stable Diffusion deep dive notebook vae encoding problem

while encode image to the latent space using
latent = vae.encode(tfms.ToTensor()(input_im).unsqueeze(0).to(torch.float16).to(torch_device)*2-1)
it gave error RuntimeError: Input type (torch.cuda.HalfTensor) and weight type (torch.HalfTensor) should be the same

As my graphics card 8gb I converted vae to torch.float16. Is that the problem.

the whole error is---

RuntimeError Traceback (most recent call last)
Cell In[20], line 2
1 # Encode to the latent space
----> 2 encoded = pil_to_latent(input_image)
3 encoded.shape
4 # Let's visualize the four channels of this latent representation:

Cell In[18], line 4, in pil_to_latent(input_im)
1 def pil_to_latent(input_im):
2 # Single image -> single latent in a batch (so size 1, 4, 64, 64)
3 with torch.no_grad():
----> 4 latent = vae.encode(tfms.ToTensor()(input_im).type(torch.float16).unsqueeze(0).to(torch_device)*2-1) # Note scaling
5 return 0.18215 * latent.latent_dist.sample()

File F:\Python 3.10.8\lib\site-packages\diffusers\models\vae.py:566, in AutoencoderKL.encode(self, x, return_dict)
565 def encode(self, x: torch.FloatTensor, return_dict: bool = True) -> AutoencoderKLOutput:
--> 566 h = self.encoder(x)
567 moments = self.quant_conv(h)
568 posterior = DiagonalGaussianDistribution(moments)

File F:\Python 3.10.8\lib\site-packages\torch\nn\modules\module.py:1190, in Module._call_impl(self, *input, **kwargs)
1186 # If we don't have any hooks, we want to skip the rest of the logic in
1187 # this function, and just call forward.
1188 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1189 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1190 return forward_call(*input, **kwargs)
1191 # Do not call functions when jit is used
1192 full_backward_hooks, non_full_backward_hooks = [], []

File F:\Python 3.10.8\lib\site-packages\diffusers\models\vae.py:130, in Encoder.forward(self, x)
128 def forward(self, x):
129 sample = x
--> 130 sample = self.conv_in(sample)
132 # down
133 for down_block in self.down_blocks:

File F:\Python 3.10.8\lib\site-packages\torch\nn\modules\conv.py:463, in Conv2d.forward(self, input)
462 def forward(self, input: Tensor) -> Tensor:
--> 463 return self._conv_forward(input, self.weight, self.bias)

File F:\Python 3.10.8\lib\site-packages\torch\nn\modules\conv.py:459, in Conv2d._conv_forward(self, input, weight, bias)
455 if self.padding_mode != 'zeros':
456 return F.conv2d(F.pad(input, self._reversed_padding_repeated_twice, mode=self.padding_mode),
457 weight, bias, self.stride,
458 _pair(0), self.dilation, self.groups)
--> 459 return F.conv2d(input, weight, bias, self.stride,
460 self.padding, self.dilation, self.groups)

RuntimeError: Input type (torch.cuda.HalfTensor) and weight type (torch.HalfTensor) should be the same

How to get pred_original_sample when using PNDMScheduler

@johnowhitaker in the Stable Diffusion Deep Dive.ipynb notebook, section The UNET and CFG.

You get latents_x0 because the scheduler exposes pred_original_sample

    latents_x0 = scheduler.step(noise_pred, t, latents).pred_original_sample # Using the scheduler (Diffusers 0.4 and above)

How to get this pred_original_sample when using PNDMScheduler? As this scheduler does not expose this value.

Stable Diffusion Deep Dive fails at UNET and CFG with IndexError: index 51 is out of bounds for dimension 0 with size 51

Full error here:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In [63], line 19
     16 # Get the predicted x0:
     17 # latents_x0 = latents - sigma * noise_pred # Calculating ourselves
     18 print(noise_pred.shape, t, latents.shape)
---> 19 latents_x0 = scheduler.step(noise_pred, t, latents).pred_original_sample # Using the scheduler (Diffusers 0.4 and above)
     21 # compute the previous noisy sample x_t -> x_t-1
     22 latents = scheduler.step(noise_pred, t, latents).prev_sample

File /usr/local/lib/python3.9/dist-packages/diffusers/schedulers/scheduling_lms_discrete.py:405, in LMSDiscreteScheduler.step(self, model_output, timestep, sample, order, return_dict)
    403 # 3. Compute linear multistep coefficients
    404 order = min(self.step_index + 1, order)
--> 405 lms_coeffs = [self.get_lms_coefficient(order, self.step_index, curr_order) for curr_order in range(order)]
    407 # 4. Compute previous sample based on the derivatives path
    408 prev_sample = sample + sum(
    409     coeff * derivative for coeff, derivative in zip(lms_coeffs, reversed(self.derivatives))
    410 )

File /usr/local/lib/python3.9/dist-packages/diffusers/schedulers/scheduling_lms_discrete.py:405, in <listcomp>(.0)
    403 # 3. Compute linear multistep coefficients
    404 order = min(self.step_index + 1, order)
--> 405 lms_coeffs = [self.get_lms_coefficient(order, self.step_index, curr_order) for curr_order in range(order)]
    407 # 4. Compute previous sample based on the derivatives path
    408 prev_sample = sample + sum(
    409     coeff * derivative for coeff, derivative in zip(lms_coeffs, reversed(self.derivatives))
    410 )

File /usr/local/lib/python3.9/dist-packages/diffusers/schedulers/scheduling_lms_discrete.py:233, in LMSDiscreteScheduler.get_lms_coefficient(self, order, t, current_order)
    230         prod *= (tau - self.sigmas[t - k]) / (self.sigmas[t - current_order] - self.sigmas[t - k])
    231     return prod
--> 233 integrated_coeff = integrate.quad(lms_derivative, self.sigmas[t], self.sigmas[t + 1], epsrel=1e-4)[0]
    235 return integrated_coeff

IndexError: index 51 is out of bounds for dimension 0 with size 51

I tried playing around with the indices, but seems like it is another issue. Moving to an older checkout doesn't fix it either.

Difference between `latents requires_grad=True` and `torch.no_grad()`

Thanks for sharing such an amazing work :)

In the last section of the notebook Stable Diffusion Deep Dive.ipynb, you mention:

NB: We should set latents requires_grad=True before we do the forward pass of the unet (removing with torch.no_grad()) if we want mode accurate gradients. BUT this requires a lot of extra memory. You'll see both approaches used depending on whose implementation you're looking at.

Can you please clarify what is the difference between the two approaches? For example, if I had to code this, I would have used torch.no_grad(), but apparently you preferred another approach. What does it change computationally and results-wise?.

I think adding this as extra info to the notebook would be useful to others, too :)

Perhaps bug in img2img example

Hey @johnowhitaker,

Might there be a bug in the img2img example here:

# Loop
for i, t in tqdm(enumerate(scheduler.timesteps)):
    if i > start_step: # << This is the only modification to the loop we do

Should this be i >= start_step?