cloneofsimo / mindiffusion Goto Github PK

Self-contained, minimalistic implementation of diffusion models with Pytorch.

Python 100.00%

mindiffusion's Introduction

minDiffusion

Goal of this educational repository is to provide a self-contained, minimalistic implementation of diffusion models using Pytorch.

Many implementations of diffusion models can be a bit overwhelming. Here, superminddpm : under 200 lines of code, fully self contained implementation of DDPM with Pytorch is a good starting point for anyone who wants to get started with Denoising Diffusion Models, without having to spend time on the details.

Simply:

$ python superminddpm.py

Above script is self-contained. (Of course, you need to have pytorch and torchvision installed. Latest version should suffice. We do not use any cutting edge features.)

If you want to use the bit more refactored code, that runs CIFAR10 dataset:

$ python train_cifar10.py

Above result took about 2 hours of training on single 3090 GPU. Top 8 images are generated, bottom 8 are ground truth.

Here is another example, trained on 100 epochs (about 1.5 hours)

Currently has:

Tiny implementation of DDPM
MNIST, CIFAR dataset.
Simple unet structure. + Simple Time embeddings.
CelebA dataset.

TODOS

DDIM
Classifier Guidance
Multimodality

Updates!

Using more parameter yields better result for MNIST.
More comments in superminddpm.py

mindiffusion's People

Contributors

Stargazers

Watchers

Forkers

kgourgou wn1695173791 gianscarpe smy20011 silencemonk dendiv fostiropoulos xianruiwang igudav siat-bit-cxh willdalh sradc firobeid chenxu31 ol3gka riverstone496 verdangeta jxzhangjhu 3sdd universewill amit-eee lannio vinhkhuc kayuksel laksjdjf tsaxena lidi100 liuwei16 perfyperfect yuliangguo prabhatkumar95 samirahajizadeh jzw0025 jlb226 kidist-amde oytunturk myausweis creatorcao changheunoh huntercqu panford hemangchawla petermchale aniketgurav ne-apps greydoubt shlee94 yanxg potsawee slacklining jeffaudi arjunp17 nghiauet antonpolishko lmyhaha ahofe oumaima-chqaf gnirut aravinda89 taestaes fujiehuang alicata jmyers7 peiyong86 5l1v3r1 hojmax leonlahoud mrdnash woohp bolundai0216 jaysinghr imagelessthought zhaoyijiang biocoder007 3a1b2c3 mhhamdan sxm13 rainapsy alpinestranger greatenanoymous krzysztofskrzypski ckyrkou eric-doug wkal2 teefalfadhli ukavya13 bspark99 eyusc luliu27 ggplz kyg0910 janihr fatsheep2020 stanleyjacob tylin7111095022

mindiffusion's Issues

how to load custom dataset?

Hi! Im new to this stuff and don't understand how can i download a custom dataset and how to add it to this model? My questions are: what format should downloaded dataset be and how to train the model on it?

diffusion_utilities installation issue

How to install diffusion_utilities in local mac or Colab

How to overfit a single image

I'm having the following samples after training:

200 epochs for 1 CELEBA image
100 epochs for 1000 CELEBA images

Shouldn't using only 1 image for training make the model to overfit that image in a few epochs and produce always that image for any given z?

Why does using more training samples makes the model to converge faster?

Thank you!

how to load safetensors file into the model?

Hi! If I have a *.safetensors file, can I load it into minDiffusion? I mean can i use either checkpoint or lora downloaded from internet along with minDiffusion?

Why are you normalizing to 1.414 in unet.py?

class Conv3(nn.Module):
...

def forward(self, x: torch.Tensor) -> torch.Tensor:
    x = self.main(x)
    if self.is_res:
        x = x + self.conv(x)
        return x / 1.414 # <= here
    else:
        return self.conv(x)

Quality of the generated images

Dear @cloneofsimo and @SilenceMonk, thanks so much for this code! It is beneficial and precisely the missing piece I need to understand diffusion models better. Also, I appreciated that I could just run the CIFAR10 training without any code modification.
I am playing around with your code to better understand the guided_diffusion repository, which I find too complex and I need to simplify.

I have trained on cifar10 and obtained the following results after 100 epochs.

As you can see, the prediction quality seems quite far from the ground truth.
I plan to extend your code to images with a larger resolution, however, I am hesitant now, as I do not understand if the network is learning or not. I would like to extend the code while maintaining convergence.

i) Is this behavior normal? Is there some critical hyperameter to tune to obtain clearer images?

UPDATE: I have trained on celebA and obtained the following results after 21 epochs (approx 14 hours on a 3090):

The celebA results seem already better than the cifar10, but I might need more training epochs because the generated images are still far from the groundtruth.

Still referring to the celebA results, you can see in the following image that the generated images could show a constant color,
background.
(below you can see celebA after 19 epochs)

This issue is similar to openai/guided-diffusion#81 .

Furthermore, you can see that the training does not progress linearly, if we take epoch 22 of celebA, we can notice that the network outputs smooth predictions with no structure again.

So overall I am not getting the training stability I was expecting. These results are (unfortunately) consistent with my issues for the guided_diffusion repository openai/guided-diffusion#42 .

iii) do you have any comment which could help overcame this issue?

Thanks again for your help!
Stefano

cloneofsimo / mindiffusion Goto Github PK

mindiffusion's Introduction

minDiffusion

Updates!

mindiffusion's People

Contributors

Stargazers

Watchers

Forkers

mindiffusion's Issues

how to load custom dataset?

diffusion_utilities installation issue

How to overfit a single image

how to load safetensors file into the model?

Why are you normalizing to 1.414 in unet.py?

Quality of the generated images

multi-gpu support

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs