GithubHelp home page GithubHelp logo

diffusion-nbs's Introduction

diffusion-nbs

Getting started with diffusion

diffusion-nbs's People

Contributors

ab-10 avatar andreaskundig avatar anubhavmaity avatar asgrasberger avatar banacl avatar cly avatar dpoulopoulos avatar drscotthawley avatar hwaxxer avatar jantic avatar johnowhitaker avatar johnshaughnessy avatar jph00 avatar kevinbird15 avatar kevinji avatar osamja avatar pcuenca avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

diffusion-nbs's Issues

TypeError: __init__() got an unexpected keyword argument 'tensor_format'

i can t solve this error.i never found tensor_format in text.what it should be?

noise_scheduler = DDPMScheduler(
    beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", num_train_timesteps=1000, tensor_format="pt")#, tensor_format="pt"

text
DDPMScheduler
num_train_timesteps – number of diffusion steps used to train the model.
beta_start – the starting beta value of inference.
beta_end – the final beta value.
beta_schedule – the beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose from linear, scaled_linear, or squaredcos_cap_v2.
trained_betas – option to pass an array of betas directly to the constructor to bypass beta_start, beta_end etc.
variance_type – options to clip the variance used when adding noise to the denoised sample. Choose from fixed_small, fixed_small_log, fixed_large, fixed_large_log, learned or learned_range.
clip_sample – option to clip predicted sample between -1 and 1 for numerical stability.
prediction_type – prediction type of the scheduler function, one of epsilon (predicting the noise of the diffusion process), sample (directly predicting the noisy sample`) or v_prediction

typo in the last code section

it says
for img in image: display(mk_img(img))

and should be
for img in images: display(mk_img(img))

i.e. image -> images

How to add clip loss in Guidence part for stable diffusion latent ?

Hi, thanks for your great work.

I'm trying to add clip loss for stable diffusion latent in the Guidence part, but when i decode image from latent and put it into clip_model, when cal clip_loss with latent, it occurs an error, RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior. How to fix this problem or is there any example to refer?

Deep Dive NB: Quick Fix for AttributeError: 'CLIPTextTransformer' object has no attribute '_build_causal_attention_mask'

In the Stable Diffusion Deep Dive notebook, in the code plot immediately following the Transformer diagram, there is the definition of get_output_embeds which includes a call to text_encoder.text_model._build_causal_attention_mask:

def get_output_embeds(input_embeddings):
    # CLIP's text model uses causal mask, so we prepare it here:
    bsz, seq_len = input_embeddings.shape[:2]
    causal_attention_mask = text_encoder.text_model._build_causal_attention_mask(bsz, seq_len, dtype=input_embeddings.dtype)
    ...

That is currently generating an error for me when I run the notebook on Colab (from a fresh instance) or my home computer:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
[<ipython-input-33-dbb74b7ec9b4>](https://localhost:8080/#) in <cell line: 26>()
     24     return output
     25 
---> 26 out_embs_test = get_output_embeds(input_embeddings) # Feed through the model with our new function
     27 print(out_embs_test.shape) # Check the output shape
     28 out_embs_test # Inspect the output

1 frames
[/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in __getattr__(self, name)
   1612             if name in modules:
   1613                 return modules[name]
-> 1614         raise AttributeError("'{}' object has no attribute '{}'".format(
   1615             type(self).__name__, name))
   1616 

AttributeError: 'CLIPTextTransformer' object has no attribute '_build_causal_attention_mask'

Everything in the notebook prior to that line runs fine.

Perhaps I'm doing something wrong, or perhaps something has changed with the HF libraries that being used, since the notebook's original conception?


UPDATE:

I see the same issue here: drboog/ProFusion#12. It seems that transformers has changed. Downgrading to version 4.25.1 fixed the problem.

Thus changing the the pip install line at the top of the notebook to

!pip install -q --upgrade transformers==4.25.1 diffusers ftfy

...will restore full functionality.

Feel free to close this issue at your convenience. Perhaps a PR is in order.

Presumably some way to keep up to date with transformers will be preferable, but for now this is a quick fix.

Stable Diffusion deep dive notebook vae encoding problem

while encode image to the latent space using
latent = vae.encode(tfms.ToTensor()(input_im).unsqueeze(0).to(torch.float16).to(torch_device)*2-1)
it gave error RuntimeError: Input type (torch.cuda.HalfTensor) and weight type (torch.HalfTensor) should be the same

As my graphics card 8gb I converted vae to torch.float16. Is that the problem.

the whole error is---

RuntimeError Traceback (most recent call last)
Cell In[20], line 2
1 # Encode to the latent space
----> 2 encoded = pil_to_latent(input_image)
3 encoded.shape
4 # Let's visualize the four channels of this latent representation:

Cell In[18], line 4, in pil_to_latent(input_im)
1 def pil_to_latent(input_im):
2 # Single image -> single latent in a batch (so size 1, 4, 64, 64)
3 with torch.no_grad():
----> 4 latent = vae.encode(tfms.ToTensor()(input_im).type(torch.float16).unsqueeze(0).to(torch_device)*2-1) # Note scaling
5 return 0.18215 * latent.latent_dist.sample()

File F:\Python 3.10.8\lib\site-packages\diffusers\models\vae.py:566, in AutoencoderKL.encode(self, x, return_dict)
565 def encode(self, x: torch.FloatTensor, return_dict: bool = True) -> AutoencoderKLOutput:
--> 566 h = self.encoder(x)
567 moments = self.quant_conv(h)
568 posterior = DiagonalGaussianDistribution(moments)

File F:\Python 3.10.8\lib\site-packages\torch\nn\modules\module.py:1190, in Module._call_impl(self, *input, **kwargs)
1186 # If we don't have any hooks, we want to skip the rest of the logic in
1187 # this function, and just call forward.
1188 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1189 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1190 return forward_call(*input, **kwargs)
1191 # Do not call functions when jit is used
1192 full_backward_hooks, non_full_backward_hooks = [], []

File F:\Python 3.10.8\lib\site-packages\diffusers\models\vae.py:130, in Encoder.forward(self, x)
128 def forward(self, x):
129 sample = x
--> 130 sample = self.conv_in(sample)
132 # down
133 for down_block in self.down_blocks:

File F:\Python 3.10.8\lib\site-packages\torch\nn\modules\module.py:1190, in Module._call_impl(self, *input, **kwargs)
1186 # If we don't have any hooks, we want to skip the rest of the logic in
1187 # this function, and just call forward.
1188 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1189 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1190 return forward_call(*input, **kwargs)
1191 # Do not call functions when jit is used
1192 full_backward_hooks, non_full_backward_hooks = [], []

File F:\Python 3.10.8\lib\site-packages\torch\nn\modules\conv.py:463, in Conv2d.forward(self, input)
462 def forward(self, input: Tensor) -> Tensor:
--> 463 return self._conv_forward(input, self.weight, self.bias)

File F:\Python 3.10.8\lib\site-packages\torch\nn\modules\conv.py:459, in Conv2d._conv_forward(self, input, weight, bias)
455 if self.padding_mode != 'zeros':
456 return F.conv2d(F.pad(input, self._reversed_padding_repeated_twice, mode=self.padding_mode),
457 weight, bias, self.stride,
458 _pair(0), self.dilation, self.groups)
--> 459 return F.conv2d(input, weight, bias, self.stride,
460 self.padding, self.dilation, self.groups)

RuntimeError: Input type (torch.cuda.HalfTensor) and weight type (torch.HalfTensor) should be the same

Stable Diffusion Deep Dive fails at UNET and CFG with IndexError: index 51 is out of bounds for dimension 0 with size 51

Full error here:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In [63], line 19
     16 # Get the predicted x0:
     17 # latents_x0 = latents - sigma * noise_pred # Calculating ourselves
     18 print(noise_pred.shape, t, latents.shape)
---> 19 latents_x0 = scheduler.step(noise_pred, t, latents).pred_original_sample # Using the scheduler (Diffusers 0.4 and above)
     21 # compute the previous noisy sample x_t -> x_t-1
     22 latents = scheduler.step(noise_pred, t, latents).prev_sample

File /usr/local/lib/python3.9/dist-packages/diffusers/schedulers/scheduling_lms_discrete.py:405, in LMSDiscreteScheduler.step(self, model_output, timestep, sample, order, return_dict)
    403 # 3. Compute linear multistep coefficients
    404 order = min(self.step_index + 1, order)
--> 405 lms_coeffs = [self.get_lms_coefficient(order, self.step_index, curr_order) for curr_order in range(order)]
    407 # 4. Compute previous sample based on the derivatives path
    408 prev_sample = sample + sum(
    409     coeff * derivative for coeff, derivative in zip(lms_coeffs, reversed(self.derivatives))
    410 )

File /usr/local/lib/python3.9/dist-packages/diffusers/schedulers/scheduling_lms_discrete.py:405, in <listcomp>(.0)
    403 # 3. Compute linear multistep coefficients
    404 order = min(self.step_index + 1, order)
--> 405 lms_coeffs = [self.get_lms_coefficient(order, self.step_index, curr_order) for curr_order in range(order)]
    407 # 4. Compute previous sample based on the derivatives path
    408 prev_sample = sample + sum(
    409     coeff * derivative for coeff, derivative in zip(lms_coeffs, reversed(self.derivatives))
    410 )

File /usr/local/lib/python3.9/dist-packages/diffusers/schedulers/scheduling_lms_discrete.py:233, in LMSDiscreteScheduler.get_lms_coefficient(self, order, t, current_order)
    230         prod *= (tau - self.sigmas[t - k]) / (self.sigmas[t - current_order] - self.sigmas[t - k])
    231     return prod
--> 233 integrated_coeff = integrate.quad(lms_derivative, self.sigmas[t], self.sigmas[t + 1], epsrel=1e-4)[0]
    235 return integrated_coeff

IndexError: index 51 is out of bounds for dimension 0 with size 51

I tried playing around with the indices, but seems like it is another issue. Moving to an older checkout doesn't fix it either.

Difference between `latents requires_grad=True` and `torch.no_grad()`

Thanks for sharing such an amazing work :)

In the last section of the notebook Stable Diffusion Deep Dive.ipynb, you mention:

NB: We should set latents requires_grad=True before we do the forward pass of the unet (removing with torch.no_grad()) if we want mode accurate gradients. BUT this requires a lot of extra memory. You'll see both approaches used depending on whose implementation you're looking at.

Can you please clarify what is the difference between the two approaches? For example, if I had to code this, I would have used torch.no_grad(), but apparently you preferred another approach. What does it change computationally and results-wise?.

I think adding this as extra info to the notebook would be useful to others, too :)

Perhaps bug in img2img example

Hey @johnowhitaker,

Might there be a bug in the img2img example here:

# Loop
for i, t in tqdm(enumerate(scheduler.timesteps)):
    if i > start_step: # << This is the only modification to the loop we do

Should this be i >= start_step?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.