Getting started with diffusion
fastai / diffusion-nbs Goto Github PK
View Code? Open in Web Editor NEWGetting started with diffusion
License: Apache License 2.0
Getting started with diffusion
License: Apache License 2.0
i can t solve this error.i never found tensor_format in text.what it should be?
noise_scheduler = DDPMScheduler(
beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", num_train_timesteps=1000, tensor_format="pt")#, tensor_format="pt"
text
DDPMScheduler
num_train_timesteps – number of diffusion steps used to train the model.
beta_start – the starting beta value of inference.
beta_end – the final beta value.
beta_schedule – the beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose from linear, scaled_linear, or squaredcos_cap_v2.
trained_betas – option to pass an array of betas directly to the constructor to bypass beta_start, beta_end etc.
variance_type – options to clip the variance used when adding noise to the denoised sample. Choose from fixed_small, fixed_small_log, fixed_large, fixed_large_log, learned or learned_range.
clip_sample – option to clip predicted sample between -1 and 1 for numerical stability.
prediction_type – prediction type of the scheduler function, one of epsilon (predicting the noise of the diffusion process), sample (directly predicting the noisy sample`) or v_prediction
In stable_diffusion.ipynb
, under Stable Diffusion Pipeline heading, hyperlinks for StableDiffusionPipeline
and diffusion inference pipeline (first and second links in the paragraph) result in 404 pages.
Can you provide new links for them please?
it says
for img in image: display(mk_img(img))
and should be
for img in images: display(mk_img(img))
i.e. image -> images
Hi, @johnowhitaker
In the positional embedding section at the line
But now instead of dealing with ~50 tokens we just need one for each position (77 total):
I think the number of tokens should be 50K instead of 50. Could you please confirm this?
Thanks
Hi, thanks for your great work.
I'm trying to add clip loss for stable diffusion latent in the Guidence part, but when i decode image from latent and put it into clip_model, when cal clip_loss with latent, it occurs an error, RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior. How to fix this problem or is there any example to refer?
OOM errors pop up when running the notebook on a 8GB GPU. I managed to run it successfully by using half-precision tensors (fp16) instead.
birb_embed = torch.load('learned_embeds.bin')
birb_embed.keys(), birb_embed[''].shape
In the Stable Diffusion Deep Dive notebook, in the code plot immediately following the Transformer diagram, there is the definition of get_output_embeds
which includes a call to text_encoder.text_model._build_causal_attention_mask
:
def get_output_embeds(input_embeddings):
# CLIP's text model uses causal mask, so we prepare it here:
bsz, seq_len = input_embeddings.shape[:2]
causal_attention_mask = text_encoder.text_model._build_causal_attention_mask(bsz, seq_len, dtype=input_embeddings.dtype)
...
That is currently generating an error for me when I run the notebook on Colab (from a fresh instance) or my home computer:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
[<ipython-input-33-dbb74b7ec9b4>](https://localhost:8080/#) in <cell line: 26>()
24 return output
25
---> 26 out_embs_test = get_output_embeds(input_embeddings) # Feed through the model with our new function
27 print(out_embs_test.shape) # Check the output shape
28 out_embs_test # Inspect the output
1 frames
[/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in __getattr__(self, name)
1612 if name in modules:
1613 return modules[name]
-> 1614 raise AttributeError("'{}' object has no attribute '{}'".format(
1615 type(self).__name__, name))
1616
AttributeError: 'CLIPTextTransformer' object has no attribute '_build_causal_attention_mask'
Everything in the notebook prior to that line runs fine.
Perhaps I'm doing something wrong, or perhaps something has changed with the HF libraries that being used, since the notebook's original conception?
I see the same issue here: drboog/ProFusion#12. It seems that transformers
has changed. Downgrading to version 4.25.1 fixed the problem.
Thus changing the the pip install
line at the top of the notebook to
!pip install -q --upgrade transformers==4.25.1 diffusers ftfy
...will restore full functionality.
Feel free to close this issue at your convenience. Perhaps a PR is in order.
Presumably some way to keep up to date with transformers
will be preferable, but for now this is a quick fix.
while encode image to the latent space using
latent = vae.encode(tfms.ToTensor()(input_im).unsqueeze(0).to(torch.float16).to(torch_device)*2-1)
it gave error RuntimeError: Input type (torch.cuda.HalfTensor) and weight type (torch.HalfTensor) should be the same
As my graphics card 8gb I converted vae to torch.float16. Is that the problem.
the whole error is---
RuntimeError Traceback (most recent call last)
Cell In[20], line 2
1 # Encode to the latent space
----> 2 encoded = pil_to_latent(input_image)
3 encoded.shape
4 # Let's visualize the four channels of this latent representation:
Cell In[18], line 4, in pil_to_latent(input_im)
1 def pil_to_latent(input_im):
2 # Single image -> single latent in a batch (so size 1, 4, 64, 64)
3 with torch.no_grad():
----> 4 latent = vae.encode(tfms.ToTensor()(input_im).type(torch.float16).unsqueeze(0).to(torch_device)*2-1) # Note scaling
5 return 0.18215 * latent.latent_dist.sample()
File F:\Python 3.10.8\lib\site-packages\diffusers\models\vae.py:566, in AutoencoderKL.encode(self, x, return_dict)
565 def encode(self, x: torch.FloatTensor, return_dict: bool = True) -> AutoencoderKLOutput:
--> 566 h = self.encoder(x)
567 moments = self.quant_conv(h)
568 posterior = DiagonalGaussianDistribution(moments)
File F:\Python 3.10.8\lib\site-packages\torch\nn\modules\module.py:1190, in Module._call_impl(self, *input, **kwargs)
1186 # If we don't have any hooks, we want to skip the rest of the logic in
1187 # this function, and just call forward.
1188 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1189 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1190 return forward_call(*input, **kwargs)
1191 # Do not call functions when jit is used
1192 full_backward_hooks, non_full_backward_hooks = [], []
File F:\Python 3.10.8\lib\site-packages\diffusers\models\vae.py:130, in Encoder.forward(self, x)
128 def forward(self, x):
129 sample = x
--> 130 sample = self.conv_in(sample)
132 # down
133 for down_block in self.down_blocks:
File F:\Python 3.10.8\lib\site-packages\torch\nn\modules\module.py:1190, in Module._call_impl(self, *input, **kwargs)
1186 # If we don't have any hooks, we want to skip the rest of the logic in
1187 # this function, and just call forward.
1188 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1189 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1190 return forward_call(*input, **kwargs)
1191 # Do not call functions when jit is used
1192 full_backward_hooks, non_full_backward_hooks = [], []
File F:\Python 3.10.8\lib\site-packages\torch\nn\modules\conv.py:463, in Conv2d.forward(self, input)
462 def forward(self, input: Tensor) -> Tensor:
--> 463 return self._conv_forward(input, self.weight, self.bias)
File F:\Python 3.10.8\lib\site-packages\torch\nn\modules\conv.py:459, in Conv2d._conv_forward(self, input, weight, bias)
455 if self.padding_mode != 'zeros':
456 return F.conv2d(F.pad(input, self._reversed_padding_repeated_twice, mode=self.padding_mode),
457 weight, bias, self.stride,
458 _pair(0), self.dilation, self.groups)
--> 459 return F.conv2d(input, weight, bias, self.stride,
460 self.padding, self.dilation, self.groups)
RuntimeError: Input type (torch.cuda.HalfTensor) and weight type (torch.HalfTensor) should be the same
@johnowhitaker in the Stable Diffusion Deep Dive.ipynb notebook, section The UNET and CFG.
You get latents_x0
because the scheduler exposes pred_original_sample
latents_x0 = scheduler.step(noise_pred, t, latents).pred_original_sample # Using the scheduler (Diffusers 0.4 and above)
How to get this pred_original_sample
when using PNDMScheduler
? As this scheduler does not expose this value.
Full error here:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Cell In [63], line 19
16 # Get the predicted x0:
17 # latents_x0 = latents - sigma * noise_pred # Calculating ourselves
18 print(noise_pred.shape, t, latents.shape)
---> 19 latents_x0 = scheduler.step(noise_pred, t, latents).pred_original_sample # Using the scheduler (Diffusers 0.4 and above)
21 # compute the previous noisy sample x_t -> x_t-1
22 latents = scheduler.step(noise_pred, t, latents).prev_sample
File /usr/local/lib/python3.9/dist-packages/diffusers/schedulers/scheduling_lms_discrete.py:405, in LMSDiscreteScheduler.step(self, model_output, timestep, sample, order, return_dict)
403 # 3. Compute linear multistep coefficients
404 order = min(self.step_index + 1, order)
--> 405 lms_coeffs = [self.get_lms_coefficient(order, self.step_index, curr_order) for curr_order in range(order)]
407 # 4. Compute previous sample based on the derivatives path
408 prev_sample = sample + sum(
409 coeff * derivative for coeff, derivative in zip(lms_coeffs, reversed(self.derivatives))
410 )
File /usr/local/lib/python3.9/dist-packages/diffusers/schedulers/scheduling_lms_discrete.py:405, in <listcomp>(.0)
403 # 3. Compute linear multistep coefficients
404 order = min(self.step_index + 1, order)
--> 405 lms_coeffs = [self.get_lms_coefficient(order, self.step_index, curr_order) for curr_order in range(order)]
407 # 4. Compute previous sample based on the derivatives path
408 prev_sample = sample + sum(
409 coeff * derivative for coeff, derivative in zip(lms_coeffs, reversed(self.derivatives))
410 )
File /usr/local/lib/python3.9/dist-packages/diffusers/schedulers/scheduling_lms_discrete.py:233, in LMSDiscreteScheduler.get_lms_coefficient(self, order, t, current_order)
230 prod *= (tau - self.sigmas[t - k]) / (self.sigmas[t - current_order] - self.sigmas[t - k])
231 return prod
--> 233 integrated_coeff = integrate.quad(lms_derivative, self.sigmas[t], self.sigmas[t + 1], epsrel=1e-4)[0]
235 return integrated_coeff
IndexError: index 51 is out of bounds for dimension 0 with size 51
I tried playing around with the indices, but seems like it is another issue. Moving to an older checkout doesn't fix it either.
Thanks for sharing such an amazing work :)
In the last section of the notebook Stable Diffusion Deep Dive.ipynb, you mention:
NB: We should set latents requires_grad=True before we do the forward pass of the unet (removing with torch.no_grad()) if we want mode accurate gradients. BUT this requires a lot of extra memory. You'll see both approaches used depending on whose implementation you're looking at.
Can you please clarify what is the difference between the two approaches? For example, if I had to code this, I would have used torch.no_grad()
, but apparently you preferred another approach. What does it change computationally and results-wise?.
I think adding this as extra info to the notebook would be useful to others, too :)
Hey @johnowhitaker,
Might there be a bug in the img2img example here:
# Loop
for i, t in tqdm(enumerate(scheduler.timesteps)):
if i > start_step: # << This is the only modification to the loop we do
Should this be i >= start_step
?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.