GithubHelp home page GithubHelp logo

huggingface / diffusion-models-class Goto Github PK

View Code? Open in Web Editor NEW
3.4K 100.0 359.0 127.48 MB

Materials for the Hugging Face Diffusion Models Course

License: Apache License 2.0

Jupyter Notebook 99.99% Python 0.01%

diffusion-models-class's Introduction

Hugging Face Diffusion Models Course

License   GitHub forks   Made with Jupyter   PyTorch

In this free course, you will:

  • 👩‍🎓 Study the theory behind diffusion models
  • 🧨 Learn how to generate images and audio with the popular 🤗 Diffusers library
  • 🏋️‍♂️ Train your own diffusion models from scratch
  • 📻 Fine-tune existing diffusion models on new datasets
  • 🗺 Explore conditional generation and guidance
  • 🧑‍🔬 Create your own custom diffusion model pipelines

Register via the signup form and then join us on Discord to get the conversations started. Instructions on how to join specific categories/channels are here.

Syllabus

📆 Publishing date 📘 Unit 👩‍💻 Hands-on
November 28, 2022 An Introduction to Diffusion Models Introduction to Diffusers and Diffusion Models From Scratch
December 12, 2022 Fine-Tuning and Guidance Fine-Tuning a Diffusion Model on New Data and Adding Guidance
December 21, 2022 Stable Diffusion Exploring a Powerful Text-Conditioned Latent Diffusion Model
January 2023 (TBC) Doing More with Diffusion Advanced Techniques to Take Diffusion Further

More information coming soon!

Prerequisites

  • Good skills in Python 🐍
  • Basics in Deep Learning and Pytorch

If it's not the case yet, you can check these free resources:

FAQ

Is this class free?

Yes, totally free 🥳.

Do I need to have a Hugging Face account to follow the course?

Yes, to push your custom models and pipelines to the hub, you need an account (it's free) 🤗.

You can create one here 👉 https://huggingface.co/join

What’s the format of the class?

The course will consist of at least 4 Units. More will be added as time goes on, on topics like diffusion for audio.

Each unit consists of some theory and background alongside one or more hands-on notebooks. Some units will also contain suggested projects and we'll have competitions and swag for the best pipelines and demos (more details TDB).

🌎 Languages and translations

Members of the 🤗 community have begun translating the course! We're planning to host this course on the Hugging Face website, so if you're interested in contributing a translation, we recommend waiting until we've formatted the English content in it's final form.

Language Authors
Chinese @darcula1993 @XhrLeokk @SuSung-boy @Hoi2022
Japanese @eltociear @nazuki155
Korean @deep-diver

diffusion-models-class's People

Contributors

adakoda avatar alocaputo avatar angellmethod avatar darcula1993 avatar daspartho avatar dbtreasure avatar deep-diver avatar dhakalnirajan avatar eltociear avatar jmelsbach avatar johnowhitaker avatar lbourdois avatar lewtun avatar mishig25 avatar qbiwan avatar relno avatar tolgacangoz avatar xianbaoqian avatar xl0 avatar yorko avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

diffusion-models-class's Issues

Pip installation breaking in Unit 1 Colab notebook and Workaround

Hi, thanks for making this fantastic resource available! I was facing an issue in running the Unit 1 Jupyter Notebook.

When I tried installing through:

%pip install -qq -U diffusers datasets transformers accelerate ftfy pyarrow
I got an error:

This behaviour is the source of the following dependency conflicts. pandas-gbq 0.17.9 requires pyarrow<10.0dev,>=3.0.0, but you have pyarrow 10.0.1 which is incompatible.
I was able to resolve it by doing:

%pip install -qq -U diffusers datasets transformers accelerate ftfy pyarrow=9.0.0

[Question] Best Datasets for advanced training

Hello. Thanks for this wonderful unit on Diffusion Models. Could you suggest some bigger datasets for more advanced training, as I would like to see how changing the UNet architecture can affect Training time with them.

Question on training Unet2DModel to predict the clean instead of noise

Thanks for the notebooks.

I have one comment
In this file https://github.com/huggingface/diffusion-models-class/blob/main/unit1/02_diffusion_models_from_scratch.ipynb

# The training loop
for epoch in range(n_epochs):

    for x, y in train_dataloader:

        # Get some data and prepare the corrupted version
        x = x.to(device) # Data on the GPU
        noise_amount = torch.rand(x.shape[0]).to(device) # Pick random noise amounts
        noisy_x = corrupt(x, noise_amount) # Create our noisy x

        # Get the model prediction
        pred = net(noisy_x, 0).sample #<<< Using timestep 0 always, adding .sample

        # Calculate the loss
        loss = loss_fn(pred, x) # How close is the output to the true 'clean' x?

        # Backprop and update the params:
        opt.zero_grad()
        loss.backward()
        opt.step()

        # Store the loss for later
        losses.append(loss.item())

    # Print our the average of the loss values for this epoch:
    avg_loss = sum(losses[-len(train_dataloader):])/len(train_dataloader)
    print(f'Finished epoch {epoch}. Average loss for this epoch: {avg_loss:05f}')

In case I want to make the network predict the clean images, should the pred formula be changed to this by attaching noise_amount?

        # Get the model prediction
        pred = net(noisy_x, noise_amount).sample #<<< Using timestep 0 always, adding .sample

Unit 2 finetune_model.py crashes on line 117 image_pipe.save_pretrained(...)

I get the error when running the script using the suggested command line.
python finetune_model.py --image_size 128 --batch_size 8 --num_epochs 16 --grad_accumulation_steps 2 --start_model "google/ddpm-celebahq-256" --dataset_name "Norod78/Vintage-Faces-FFHQAligned" --wandb_project 'dm-finetune' --log_samples_every 100 --save_model_every 1000 --model_save_name 'vintageface'

Running Python 3.11.8 on Windows 11.
Error is "TypeError: Object of type WindowsPath is not JSON serializable"
It does create the "vintageface \ unet" directory and an empty file "config.json"

Here is the end of the log:
File "C:\Users\Vincent Dovydaitis\Build\diffusion-models-class\unit2\finetune_model.py", line 117, in train
image_pipe.save_pretrained(model_save_name)
File "C:\Users\Vincent Dovydaitis.conda\envs\vision\Lib\site-packages\diffusers\pipelines\pipeline_utils.py", line 284, in save_pretrained
save_method(os.path.join(save_directory, pipeline_component_name), **save_kwargs)
File "C:\Users\Vincent Dovydaitis.conda\envs\vision\Lib\site-packages\diffusers\models\modeling_utils.py", line 315, in save_pretrained
model_to_save.save_config(save_directory)
File "C:\Users\Vincent Dovydaitis.conda\envs\vision\Lib\site-packages\diffusers\configuration_utils.py", line 168, in save_config
self.to_json_file(output_config_file)
File "C:\Users\Vincent Dovydaitis.conda\envs\vision\Lib\site-packages\diffusers\configuration_utils.py", line 610, in to_json_file
writer.write(self.to_json_string())
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Vincent Dovydaitis.conda\envs\vision\Lib\site-packages\diffusers\configuration_utils.py", line 599, in to_json_string
return json.dumps(config_dict, indent=2, sort_keys=True) + "\n"
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Vincent Dovydaitis.conda\envs\vision\Lib\json_init_.py", line 238, in dumps
**kw).encode(obj)
^^^^^^^^^^^
File "C:\Users\Vincent Dovydaitis.conda\envs\vision\Lib\json\encoder.py", line 202, in encode
chunks = list(chunks)
^^^^^^^^^^^^
File "C:\Users\Vincent Dovydaitis.conda\envs\vision\Lib\json\encoder.py", line 432, in _iterencode
yield from _iterencode_dict(o, _current_indent_level)
File "C:\Users\Vincent Dovydaitis.conda\envs\vision\Lib\json\encoder.py", line 406, in _iterencode_dict
yield from chunks
File "C:\Users\Vincent Dovydaitis.conda\envs\vision\Lib\json\encoder.py", line 439, in _iterencode
o = _default(o)
^^^^^^^^^^^
File "C:\Users\Vincent Dovydaitis.conda\envs\vision\Lib\json\encoder.py", line 180, in default
raise TypeError(f'Object of type {o.class.name} '
TypeError: Object of type WindowsPath is not JSON serializable

Might be a typo on formula for diffusion step using alpha

On the formula for q(xt | x0) on text between cell 9 and 10 of notebook 01_introduction_to_diffusers, currently the formula is stated as :

q(xt | x0) ~ N(sqrt(1-alpha)*x0, sqrt(1-alpha)*I)

=> This might be a typo, according to the paper it should be:

q(xt | x0) ~ N(sqrt(1-alpha)*x0, (1-alpha)*I)

Sorry for the bad typing

A suggestion in unit3

https://colab.research.google.com/github/huggingface/diffusion-models-class/blob/main/unit3/01_stable_diffusion_introduction.ipynb

from torchvision import transforms
display(init_image)
# pil image convert to torch.tensor
images = transforms.Compose([transforms.ToTensor()])(init_image).unsqueeze(0).to(device,torch.float)
print("Input images shape:", images.shape)

# Encode to latent space
with torch.no_grad():
  latents = 0.18215 * pipe.vae.encode(images).latent_dist.mean
print("Encoded latents shape:", latents.shape)

# Decode again
with torch.no_grad():
  decoded_images = pipe.vae.decode(latents / 0.18215).sample
print("Decoded images shape:", decoded_images.shape)
display(transforms.functional.to_pil_image(decoded_images[0]))

unit1/01 clarity needed

I'm trying to wade my way through
https://github.com/huggingface/diffusion-models-class/blob/main/unit1/01_introduction_to_diffusers.ipynb

Fairly informative overall, but a few things mire me down, in Step 4:

  1. It is confusing that the IMAGE shows 5 green blocks, and the description says "down_block_types correspond to the [green blocks]"... but in the code, down_block_types has 4 entries. Super confusing. Image needs to be remade to match up with code better.

  2. It is completely unclear what "block_out_channels" is for.

  3. "Middle block" is shown very noticably in the image, but there is zero explaination for them that I can see? They arent even refernced visibly in the code.

unit1/02_diffusion_models_from_scratch.ipynb typo in formulas

Hi! I think a typo fixed in notebook 01 wasn't fixed in notebook 02,

q(xt | x0) ~ N(sqrt(1-alpha)*x0, sqrt(1-alpha)*I)

should be

q(xt | x0) ~ N(sqrt(1-alpha)*x0, 1-alpha*I).

For the same reason, instead of

"and add noise scaled by beta_t"

should be

"and add noise scaled by sqrt(beta_t)"

because the noise should be scaled with the std. dev., not the variance.

Or in other terms

q(xt | x0) ~ N(sqrt(1-beta)*x0, beta*I) => q(xt | x0) = sqrt(1-beta) * x0 + sqrt(beta) * e

with e ~ N(0, I),

var(sqrt(1-beta) * x0 + sqrt(beta) * e) = var(sqrt(beta) * e) = beta * var(e) = beta * I.

This second typo is still in notebook 01 too.

Do you think this makes sense?

Sorry if I'm missing something here and thanks for the great course!

Hackathon - unable to get bitsandbytes to detect CUDA

I have tried the notebook on Googe Cloud and have run into exactly the same thing as when I do the install locally. I am running my instance on a Tesla T4 GPU in a Google Cloud VM and have 30GB RAM and 1 GPU. I have scaled the RAM up and down to try and make this run, but to no avail. The error message that I am seeing is:

I am still getting this issue:
CUDA Setup failed despite GPU being available. Inspect the CUDA SETUP outputs aboveto fix your environment! If you cannot find any issues and suspect a bug, please open an issue with detals about your environment: https://github.com/TimDettmers/bitsandbytes/issues

I tried the steps to resolve this, but it still seems an issue:

  1. git clone [email protected]:TimDettmers/bitsandbytes.git
  2. cd bitsandbytes
  3. CUDA_VERSION=116
  4. python setup.py install

Anyone else with the same issue? When I try nvidia-smi I know my GPU is being detected and works well with PyTorch.

error in dreambooth.ipynb (Unit 3)

I tried to run the 'notebook_launcher' function and got error:

/usr/local/lib/python3.8/dist-packages/bitsandbytes/functional.py in optimizer_update_8bit_blockwise(optimizer_name, g, p, state1, state2, beta1, beta2, eps, step, lr, qmap1, qmap2, absmax1, absmax2, weight_decay, gnorm_scale, skip_zeros)
950
951 if g.dtype == torch.float32 and state1.dtype == torch.uint8:
--> 952 str2optimizer8bit_blockwise[optimizer_name][0](
953 get_ptr(p),
954 get_ptr(g),

NameError: name 'str2optimizer8bit_blockwise' is not defined

Chinese whole version?

Hello, thank you for your kind courses. but where can I find the Chinese version of the courses? Thank you.

Introduction to Diffusers Notebook - Mac M1 Support

I have spotted an Issue not directly for this notebook but for Mac m1 users.

If you set device = torch.device("cpu") the output of show_images(xb).resize((8 * 64, 64), resample=Image.NEAREST) gives you 8 different images of butterflies.
Bit if you set device = torch.device("mps") the output of the same functions return 8 times the same butterfly.

Don´t know what's going on under the hood with torch gpu for Mac m1 yet - didn't find something googeling it.

notebook_login() from huggingface_hub in VSCode Jupyter notebook

I am running the following in a VSCode notebook remotely:

#!%load_ext autoreload
#!%autoreload 2
%%sh
pip install -q --upgrade pip
pip install -q --upgrade diffusers transformers scipy ftfy huggingface_hub
from huggingface_hub import notebook_login

# Required to get access to stable diffusion model
notebook_login()

import torch
from diffusers import StableDiffusionPipeline

pipeline = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, revision="fp16"
)

pipeline = pipeline.to("cuda")

import os

from IPython.display import Image, display


def generate_images(
    prompt,
    num_images_to_generate,
    num_images_per_prompt=4,
    guidance_scale=8,
    output_dir="images",
    display_images=False,
):

    num_iterations = num_images_to_generate // num_images_per_prompt
    os.makedirs(output_dir, exist_ok=True)

    for i in range(num_iterations):
        images = pipeline(
            prompt, num_images_per_prompt=num_images_per_prompt, guidance_scale=guidance_scale
        )
        for idx, image in enumerate(images.images):
            image_name = f"{output_dir}/image_{(i*num_images_per_prompt)+idx}.png"
            image.save(image_name)
            if display_images:
                display(Image(filename=image_name, width=128, height=128))

# 1000 images take just under an 1 hour on a V100

generate_images("a meal of boeuf bourguignon", 3, guidance_scale=4, display_images=True)


However, three things that I need help with:

  1. why no interactive thing happens in notebook_login()? I also don't see a message after executing the notebook_login() cell.

  2. When I have the number of images to be generated as 1000 or even 100 I get cuda out of memory error. However, I am not sure how to change batch size in your code. Could you please help on that?

  3. When I changed the number of images to be generated to 4, nothing is generated in the images folders as you see in the screenshot below.
    Screenshot from 2022-12-21 10-28-07

Confusing inpaint

My question is about the Inpaint section.

inpaint-unet

In the tutorial's illustration, the Inpainting UNET seems to take 'text embedding', 'noisy latents', 'inpainting mask', and 'timestep' as inputs. However, shouldn't 'init_image' also be included as an input? The 'init_image' is also being used as input in the tutorial's code.

prompt = "A small robot, high resolution, sitting on a park bench"
image = pipe(prompt=prompt, image=init_image, mask_image=mask_image).images[0]

Another query I have is regarding the training of the inpainting model. Is the masked-image and the mask provided as additional channels together for conditioning?

Am I correct in understanding that the training involves adding noise to the input, which includes four additional channels for the encoded masked-image and one additional channel for the mask?

02_diffusion_for_audio.ipynb encounters an error when running

/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:72: UserWarning:
The secret HF_TOKEN does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
warnings.warn(
model_index.json: 100%
244/244 [00:00<00:00, 14.6kB/s]

ValueError Traceback (most recent call last)
in <cell line: 3>()
1 # Load a pre-trained audio diffusion pipeline
2 device = "mps" if torch.backends.mps.is_available() else "cuda" if torch.cuda.is_available() else "cpu"
----> 3 pipe = DiffusionPipeline.from_pretrained("teticio/audio-diffusion-instrumental-hiphop-256").to(device)

3 frames
/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/pipeline_utils.py in download(cls, pretrained_model_name, **kwargs)
1698 custom_components[component] = module_candidate
1699 elif module_candidate not in LOADABLE_CLASSES and not hasattr(pipelines, module_candidate):
-> 1700 raise ValueError(
1701 f"{candidate_file} as defined in model_index.json does not exist in {pretrained_model_name} and is not a module in 'diffusers/pipelines'."
1702 )

ValueError: mel/audio_diffusion.py as defined in model_index.json does not exist in teticio/audio-diffusion-instrumental-hiphop-256 and is not a module in 'diffusers/pipelines'.

use colab
I am a beginner and cannot find the reason. Please help me.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.