GithubHelp home page GithubHelp logo

edm's Introduction

Elucidating the Design Space of Diffusion-Based Generative Models (EDM)
Official PyTorch implementation of the NeurIPS 2022 paper

Teaser image

Elucidating the Design Space of Diffusion-Based Generative Models
Tero Karras, Miika Aittala, Timo Aila, Samuli Laine
https://arxiv.org/abs/2206.00364

Abstract: We argue that the theory and practice of diffusion-based generative models are currently unnecessarily convoluted and seek to remedy the situation by presenting a design space that clearly separates the concrete design choices. This lets us identify several changes to both the sampling and training processes, as well as preconditioning of the score networks. Together, our improvements yield new state-of-the-art FID of 1.79 for CIFAR-10 in a class-conditional setting and 1.97 in an unconditional setting, with much faster sampling (35 network evaluations per image) than prior designs. To further demonstrate their modular nature, we show that our design changes dramatically improve both the efficiency and quality obtainable with pre-trained score networks from previous work, including improving the FID of a previously trained ImageNet-64 model from 2.07 to near-SOTA 1.55, and after re-training with our proposed improvements to a new SOTA of 1.36.

For business inquiries, please visit our website and submit the form: NVIDIA Research Licensing

Requirements

  • Linux and Windows are supported, but we recommend Linux for performance and compatibility reasons.
  • 1+ high-end NVIDIA GPU for sampling and 8+ GPUs for training. We have done all testing and development using V100 and A100 GPUs.
  • 64-bit Python 3.8 and PyTorch 1.12.0 (or later). See https://pytorch.org for PyTorch install instructions.
  • Python libraries: See environment.yml for exact library dependencies. You can use the following commands with Miniconda3 to create and activate your Python environment:
    • conda env create -f environment.yml -n edm
    • conda activate edm
  • Docker users:

Getting started

To reproduce the main results from our paper, simply run:

python example.py

This is a minimal standalone script that loads the best pre-trained model for each dataset and generates a random 8x8 grid of images using the optimal sampler settings. Expected results:

Dataset Runtime Reference image
CIFAR-10 ~6 sec cifar10-32x32.png
FFHQ ~28 sec ffhq-64x64.png
AFHQv2 ~28 sec afhqv2-64x64.png
ImageNet ~5 min imagenet-64x64.png

The easiest way to explore different sampling strategies is to modify example.py directly. You can also incorporate the pre-trained models and/or our proposed EDM sampler in your own code by simply copy-pasting the relevant bits. Note that the class definitions for the pre-trained models are stored within the pickles themselves and loaded automatically during unpickling via torch_utils.persistence. To use the models in external Python scripts, just make sure that torch_utils and dnnlib are accesible through PYTHONPATH.

Docker: You can run the example script using Docker as follows:

# Build the edm:latest image
docker build --tag edm:latest .

# Run the generate.py script using Docker:
docker run --gpus all -it --rm --user $(id -u):$(id -g) \
    -v `pwd`:/scratch --workdir /scratch -e HOME=/scratch \
    edm:latest \
    python example.py

Note: The Docker image requires NVIDIA driver release r520 or later.

The docker run invocation may look daunting, so let's unpack its contents here:

  • --gpus all -it --rm --user $(id -u):$(id -g): with all GPUs enabled, run an interactive session with current user's UID/GID to avoid Docker writing files as root.
  • -v `pwd`:/scratch --workdir /scratch: mount current running dir (e.g., the top of this git repo on your host machine) to /scratch in the container and use that as the current working dir.
  • -e HOME=/scratch: specify where to cache temporary files. Note: if you want more fine-grained control, you can instead set DNNLIB_CACHE_DIR (for pre-trained model download cache). You want these cache dirs to reside on persistent volumes so that their contents are retained across multiple docker run invocations.

Pre-trained models

We provide pre-trained models for our proposed training configuration (config F) as well as the baseline configuration (config A):

To generate a batch of images using a given model and sampler, run:

# Generate 64 images and save them as out/*.png
python generate.py --outdir=out --seeds=0-63 --batch=64 \
    --network=https://nvlabs-fi-cdn.nvidia.com/edm/pretrained/edm-cifar10-32x32-cond-vp.pkl

Generating a large number of images can be time-consuming; the workload can be distributed across multiple GPUs by launching the above command using torchrun:

# Generate 1024 images using 2 GPUs
torchrun --standalone --nproc_per_node=2 generate.py --outdir=out --seeds=0-999 --batch=64 \
    --network=https://nvlabs-fi-cdn.nvidia.com/edm/pretrained/edm-cifar10-32x32-cond-vp.pkl

The sampler settings can be controlled through command-line options; see python generate.py --help for more information. For best results, we recommend using the following settings for each dataset:

# For CIFAR-10 at 32x32, use deterministic sampling with 18 steps (NFE = 35)
python generate.py --outdir=out --steps=18 \
    --network=https://nvlabs-fi-cdn.nvidia.com/edm/pretrained/edm-cifar10-32x32-cond-vp.pkl

# For FFHQ and AFHQv2 at 64x64, use deterministic sampling with 40 steps (NFE = 79)
python generate.py --outdir=out --steps=40 \
    --network=https://nvlabs-fi-cdn.nvidia.com/edm/pretrained/edm-ffhq-64x64-uncond-vp.pkl

# For ImageNet at 64x64, use stochastic sampling with 256 steps (NFE = 511)
python generate.py --outdir=out --steps=256 --S_churn=40 --S_min=0.05 --S_max=50 --S_noise=1.003 \
    --network=https://nvlabs-fi-cdn.nvidia.com/edm/pretrained/edm-imagenet-64x64-cond-adm.pkl

Besides our proposed EDM sampler, generate.py can also be used to reproduce the sampler ablations from Section 3 of our paper. For example:

# Figure 2a, "Our reimplementation"
python generate.py --outdir=out --steps=512 --solver=euler --disc=vp --schedule=vp --scaling=vp \
    --network=https://nvlabs-fi-cdn.nvidia.com/edm/pretrained/baseline/baseline-cifar10-32x32-uncond-vp.pkl

# Figure 2a, "+ Heun & our {t_i}"
python generate.py --outdir=out --steps=128 --solver=heun --disc=edm --schedule=vp --scaling=vp \
    --network=https://nvlabs-fi-cdn.nvidia.com/edm/pretrained/baseline/baseline-cifar10-32x32-uncond-vp.pkl

# Figure 2a, "+ Our sigma(t) & s(t)"
python generate.py --outdir=out --steps=18 --solver=heun --disc=edm --schedule=linear --scaling=none \
    --network=https://nvlabs-fi-cdn.nvidia.com/edm/pretrained/baseline/baseline-cifar10-32x32-uncond-vp.pkl

Calculating FID

To compute Fréchet inception distance (FID) for a given model and sampler, first generate 50,000 random images and then compare them against the dataset reference statistics using fid.py:

# Generate 50000 images and save them as fid-tmp/*/*.png
torchrun --standalone --nproc_per_node=1 generate.py --outdir=fid-tmp --seeds=0-49999 --subdirs \
    --network=https://nvlabs-fi-cdn.nvidia.com/edm/pretrained/edm-cifar10-32x32-cond-vp.pkl

# Calculate FID
torchrun --standalone --nproc_per_node=1 fid.py calc --images=fid-tmp \
    --ref=https://nvlabs-fi-cdn.nvidia.com/edm/fid-refs/cifar10-32x32.npz

Both of the above commands can be parallelized across multiple GPUs by adjusting --nproc_per_node. The second command typically takes 1-3 minutes in practice, but the first one can sometimes take several hours, depending on the configuration. See python fid.py --help for the full list of options.

Note that the numerical value of FID varies across different random seeds and is highly sensitive to the number of images. By default, fid.py will always use 50,000 generated images; providing fewer images will result in an error, whereas providing more will use a random subset. To reduce the effect of random variation, we recommend repeating the calculation multiple times with different seeds, e.g., --seeds=0-49999, --seeds=50000-99999, and --seeds=100000-149999. In our paper, we calculated each FID three times and reported the minimum.

Also note that it is important to compare the generated images against the same dataset that the model was originally trained with. To facilitate evaluation, we provide the exact reference statistics that correspond to our pre-trained models:

For ImageNet, we provide two sets of reference statistics to enable apples-to-apples comparison: imagenet-64x64.npz should be used when evaluating the EDM model (edm-imagenet-64x64-cond-adm.pkl), whereas imagenet-64x64-baseline.npz should be used when evaluating the baseline model (baseline-imagenet-64x64-cond-adm.pkl); the latter was originally trained by Dhariwal and Nichol using slightly different training data.

You can compute the reference statistics for your own datasets as follows:

python fid.py ref --data=datasets/my-dataset.zip --dest=fid-refs/my-dataset.npz

Preparing datasets

Datasets are stored in the same format as in StyleGAN: uncompressed ZIP archives containing uncompressed PNG files and a metadata file dataset.json for labels. Custom datasets can be created from a folder containing images; see python dataset_tool.py --help for more information.

CIFAR-10: Download the CIFAR-10 python version and convert to ZIP archive:

python dataset_tool.py --source=downloads/cifar10/cifar-10-python.tar.gz \
    --dest=datasets/cifar10-32x32.zip
python fid.py ref --data=datasets/cifar10-32x32.zip --dest=fid-refs/cifar10-32x32.npz

FFHQ: Download the Flickr-Faces-HQ dataset as 1024x1024 images and convert to ZIP archive at 64x64 resolution:

python dataset_tool.py --source=downloads/ffhq/images1024x1024 \
    --dest=datasets/ffhq-64x64.zip --resolution=64x64
python fid.py ref --data=datasets/ffhq-64x64.zip --dest=fid-refs/ffhq-64x64.npz

AFHQv2: Download the updated Animal Faces-HQ dataset (afhq-v2-dataset) and convert to ZIP archive at 64x64 resolution:

python dataset_tool.py --source=downloads/afhqv2 \
    --dest=datasets/afhqv2-64x64.zip --resolution=64x64
python fid.py ref --data=datasets/afhqv2-64x64.zip --dest=fid-refs/afhqv2-64x64.npz

ImageNet: Download the ImageNet Object Localization Challenge and convert to ZIP archive at 64x64 resolution:

python dataset_tool.py --source=downloads/imagenet/ILSVRC/Data/CLS-LOC/train \
    --dest=datasets/imagenet-64x64.zip --resolution=64x64 --transform=center-crop
python fid.py ref --data=datasets/imagenet-64x64.zip --dest=fid-refs/imagenet-64x64.npz

Training new models

You can train new models using train.py. For example:

# Train DDPM++ model for class-conditional CIFAR-10 using 8 GPUs
torchrun --standalone --nproc_per_node=8 train.py --outdir=training-runs \
    --data=datasets/cifar10-32x32.zip --cond=1 --arch=ddpmpp

The above example uses the default batch size of 512 images (controlled by --batch) that is divided evenly among 8 GPUs (controlled by --nproc_per_node) to yield 64 images per GPU. Training large models may run out of GPU memory; the best way to avoid this is to limit the per-GPU batch size, e.g., --batch-gpu=32. This employs gradient accumulation to yield the same results as using full per-GPU batches. See python train.py --help for the full list of options.

The results of each training run are saved to a newly created directory, for example training-runs/00000-cifar10-cond-ddpmpp-edm-gpus8-batch64-fp32. The training loop exports network snapshots (network-snapshot-*.pkl) and training states (training-state-*.pt) at regular intervals (controlled by --snap and --dump). The network snapshots can be used to generate images with generate.py, and the training states can be used to resume the training later on (--resume). Other useful information is recorded in log.txt and stats.jsonl. To monitor training convergence, we recommend looking at the training loss ("Loss/loss" in stats.jsonl) as well as periodically evaluating FID for network-snapshot-*.pkl using generate.py and fid.py.

The following table lists the exact training configurations that we used to obtain our pre-trained models:

Model GPUs Time Options
cifar10‑32x32‑cond‑vp 8xV100 ~2 days --cond=1 --arch=ddpmpp
cifar10‑32x32‑cond‑ve 8xV100 ~2 days --cond=1 --arch=ncsnpp
cifar10‑32x32‑uncond‑vp 8xV100 ~2 days --cond=0 --arch=ddpmpp
cifar10‑32x32‑uncond‑ve 8xV100 ~2 days --cond=0 --arch=ncsnpp
ffhq‑64x64‑uncond‑vp 8xV100 ~4 days --cond=0 --arch=ddpmpp --batch=256 --cres=1,2,2,2 --lr=2e-4 --dropout=0.05 --augment=0.15
ffhq‑64x64‑uncond‑ve 8xV100 ~4 days --cond=0 --arch=ncsnpp --batch=256 --cres=1,2,2,2 --lr=2e-4 --dropout=0.05 --augment=0.15
afhqv2‑64x64‑uncond‑vp 8xV100 ~4 days --cond=0 --arch=ddpmpp --batch=256 --cres=1,2,2,2 --lr=2e-4 --dropout=0.25 --augment=0.15
afhqv2‑64x64‑uncond‑ve 8xV100 ~4 days --cond=0 --arch=ncsnpp --batch=256 --cres=1,2,2,2 --lr=2e-4 --dropout=0.25 --augment=0.15
imagenet‑64x64‑cond‑adm 32xA100 ~13 days --cond=1 --arch=adm --duration=2500 --batch=4096 --lr=1e-4 --ema=50 --dropout=0.10 --augment=0 --fp16=1 --ls=100 --tick=200

For ImageNet-64, we ran the training on four NVIDIA DGX A100 nodes, each containing 8 Ampere GPUs with 80 GB of memory. To reduce the GPU memory requirements, we recommend either training the model with more GPUs or limiting the per-GPU batch size with --batch-gpu. To set up multi-node training, please consult the torchrun documentation.

License

Copyright © 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

All material, including source code and pre-trained models, is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

baseline-cifar10-32x32-uncond-vp.pkl and baseline-cifar10-32x32-uncond-ve.pkl are derived from the pre-trained models by Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. The models were originally shared under the Apache 2.0 license.

baseline-imagenet-64x64-cond-adm.pkl is derived from the pre-trained model by Prafulla Dhariwal and Alex Nichol. The model was originally shared under the MIT license.

imagenet-64x64-baseline.npz is derived from the precomputed reference statistics by Prafulla Dhariwal and Alex Nichol. The statistics were originally shared under the MIT license.

Citation

@inproceedings{Karras2022edm,
  author    = {Tero Karras and Miika Aittala and Timo Aila and Samuli Laine},
  title     = {Elucidating the Design Space of Diffusion-Based Generative Models},
  booktitle = {Proc. NeurIPS},
  year      = {2022}
}

Development

This is a research reference implementation and is treated as a one-time code drop. As such, we do not accept outside code contributions in the form of pull requests.

Acknowledgments

We thank Jaakko Lehtinen, Ming-Yu Liu, Tuomas Kynkäänniemi, Axel Sauer, Arash Vahdat, and Janne Hellsten for discussions and comments, and Tero Kuosmanen, Samuel Klenberg, and Janne Hellsten for maintaining our compute infrastructure.

edm's People

Contributors

tkarras avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

edm's Issues

Can not reproduct the result in Table 2.

I run the following training command on 8 v100.

torchrun --standalone --nproc_per_node=8 train.py --outdir=training-runs
--data=datasets/cifar10-32x32.zip --cond=0 --arch=ddpmpp

This gives me a FID at 2.08169 which is still far from FID in the paper(1.97).

I think this may caused by the random seeds(Random init in the code).
Is it possible to share the seed for reprodcuting the result in table2?

Any suggestiones?

Zero initialization of convolutions

Hi,
I have observed that the code carefully initialize certain convolutions with zeros init.
Do you have any reference for this kind of design decision?

Thanks!

Question about the parameterizations of VP-SDE

Hi, I'm reading DDPM, Score-based SDE (let's call it SDE for short) and your EDM (brilliant work!) recently.
I want to check the VP derivations in Song's SDE and yours in Table 1. But I got confused:

In the SDE paper, the perturbation kernel (Eq. 33, Appendix C) is:

$$ p(x(t)|x(0)) = \mathcal{N}(\sqrt{e}x(0), (1-e)I) $$

where $e := \exp{[-\frac{1}{2} t^2 \bar\beta_{d}-t \bar\beta_{min}]}$. The reparameterization trick presents:

$$ x(t) = \sqrt{e}x(0) + \sqrt{1-e} \epsilon $$

If my understanding is correct, in your paper, the perturbation is defined as:

$$ x(t) = s(t) [y+n] = s(t) [x(0) + \sigma(t) \epsilon] $$

Therefore, we should have:

$$ \sigma(t) := \frac{\sqrt{1-e}}{\sqrt{e}} = \sqrt{\frac{1}{e}-1}$$

and

$$ s(t) :=\sqrt{e} $$

However, the VP parameterizations in your Table 1 say $\sigma(t) = \sqrt{\frac{1}{e} - 1}$ and $s(t) = e$.
I wonder why the $s(t)$ formulation differs. Is the derivation above incorrect somewhere? Or is it a mistake by you / Song?

VP checkpoints are trained using VE-scaling?

I downloaded a vp checkpoint

CUDA_VISIBLE_DEVICES=6 python generate.py --outdir=out --seeds=0-63 --batch=64 --network=https://nvlabs-fi-cdn.nvidia.com/edm/pretrained/baseline/baseline-cifar10-32x32-uncond-vp.pkl --batch=4 --solver=euler

But I discovered that actually the code are sampling using ve scaling. When I force it to use vp scaling by

CUDA_VISIBLE_DEVICES=6 python generate.py --outdir=out --seeds=0-63 --batch=64 --network=https://nvlabs-fi-cdn.nvidia.com/edm/pretrained/baseline/baseline-cifar10-32x32-uncond-vp.pkl --batch=4 --solver=euler --scaling=vp

It will output images like:
concatenated

Does this mean the VP models you trained are actually using ve scaling?

About hyper-parameters selection

I want to train my own SD model on high resolution under EDM framework. Any experience about how to select the best hyper-parameters on 1024*0124 resolution? Such as sigma_min, sigma_max, P_mean, P_std and so on.

.

.

Is it possible to also share the code for Figure 3 in the paper

I found Figure 3 to be very helpful for understanding diffuson models in general, and it would be better if one can play with the code.

I tried something like

import numpy as np
from scipy.integrate import odeint
import matplotlib.pyplot as plt

schedule = 'VESDE'

if schedule == 'VESDE':
    sigma_min = 0.02
    sigma_max = 100
    rho = 0
elif schedule == 'EDM':
    sigma_min = 0.002
    sigma_max = 80
    rho = 7.0

def gauss_norm(x):
    return np.exp(-x**2/2)/np.sqrt(2*np.pi)

def pred_x0_theory(x,sigma_t):
    c = np.array([
        [-1],
        [ 1],
    ])
    nominator = 0
    denominator = 0
    for i in range(c.shape[0]):
        nominator += c[i] * gauss_norm((x-c[i])/sigma_t)
        denominator += gauss_norm((x-c[i])/sigma_t)
    return nominator / (denominator + 0)

# define the ODE
def model(x, t):
    def s_t(t):
        return 1

    def s_t_p(t):
        return 0

    def sigma_t(t):
        # VESDE
        if schedule == 'VESDE':
            return sigma_min**2 * (sigma_max**2/sigma_min**2)**t
        elif schedule == 'EDM':
            rho_inv = 1.0 / rho
            sigmas = sigma_min**rho_inv + t * (
                sigma_max**rho_inv - sigma_min**rho_inv
            )
            sigmas = sigmas**rho
            return sigmas

    def sigma_t_p(t, h=1e-5):
        return (sigma_t(t + h) - sigma_t(t - h)) / (2. * h)

    first_term = s_t_p(t) / s_t(t) * x
    score = (pred_x0_theory(x/s_t(t), sigma_t(t)) - x) / sigma_t(t)**2
    second_term = s_t(t)**2 * sigma_t_p(t) * sigma_t(t) * score
    dxdt = first_term - second_term
    return dxdt

# initial condition
x0 = 1
print(x0)

# time points
t = np.linspace(0.01,0.6,100)

# solve ODE
y = odeint(model, x0, t)

# plot results
plt.plot(t, y)
plt.xlabel('time')
plt.ylabel('x(t)')
plt.show()

but it does not work out well.

thank you so much in advance for your help.

Question about parameter tuning

  1. The paper outlines the hyperparameters sigma_min and sigma_max during deterministic sampling, as well as P_mean, P_std, and sigma_data during training. Could you kindly elaborate on how these hyperparameters are determined and shed light on their impact on both the training and sampling processes?
  2. If I intend to apply the EDM framework to my own dataset, how would you recommend adjusting the aforementioned hyperparameters based on the characteristics of my dataset? Are there any guidelines or considerations for fine-tuning these parameters for optimal performance?
    Thank you very much for considering my inquiry.

Possible error in the up-sampling function

Hi all.

I have a question regarding the fused resampling in the Conv2d layer. When I use this option, the output shape does not correspond to a 2x up-sampling of the input. Here is a simple code to reproduce the issue:

import torch
from training.networks import Conv2d

layer = Conv2d(64, 128, kernel=3, up=True, fused_resample=True, resample_filter=[1, 1])
x = torch.randn(2, 64, 32, 32)
print(layer(x).shape) # output shape is (2, 128, 60, 60)

Is this the desired behavior or a bug? (This is not compatible with the StyleGAN codebases.)

Thanks a lot in advance.

Deterministic sampling and sampling steps

Hi, I try to train the EDM model with a simpler 35.7M #params UNet (proposed by original DDPM paper) and compare the result with DDPM/DDIM.
I notice that $S_{churn} = 0$ leads to deterministic sampling, and $\gamma_i = \sqrt{2}-1$ leads to "max" stochastic sampling. So I introduce a parameter $\eta = \frac{S_{churn} / N}{\sqrt{2}-1}$ to control stochasticity by interpolations. That is to say, $\gamma_i = (\sqrt{2}-1) * \eta$. Like in DDIM, $\eta = 0$ means deterministic, $\eta = 1$ means "max" stochastic.

I set different $\eta$ s and different steps to observe FIDs:

$\eta$/steps steps=18 steps=50 steps=100
$\eta=0.0$ 3.39 3.64 3.68
$\eta=0.5$ 3.10 2.95 2.93
$\eta=1.0$ 3.12 2.84 2.97

The FID is supposed to decrease when using more sampling steps, right? But why the FID gets worse for deterministic sampling? However it performs normally when $\eta=0.5$, and it increases again from 50 steps to 100 steps @ $\eta=1.0$. Why the behavior is so unstable and unpredictable?

To confirm it's not a bug, I train a model with your official codebase under the simpler setting close to DDPM (duration=100, augment=None, xflip=True; channel_mult=[1,2,2,2], num_blocks=2). The results are:

$\eta$/steps steps=18 steps=50
$\eta=0.0$ 2.94 3.09
$\eta=0.5$ 2.80 2.75
$\eta=1.0$ 2.95 2.78

For deterministic sampling, the FID is still getting worse when using more steps. When $\eta > 0$, the FID slightly gets better when steps increase.
If the hyper-parameter settings and the corresponding performance are not consistently predictable, then how to obtain a good model under different datasets? Only by brute force & grid search?

Could you please provide some explanation and thoughts?
Thanks a lot!

AssertionError: Torch not compiled with CUDA enabled

I am trying to run the example file and I've followed the installation instruction using conda on windows 10:
conda env create -f environment.yml -n edm
conda activate edm

I always have this error:
Traceback (most recent call last):
File "C:\Users\user\Documents\edm\example.py", line 92, in
main()
File "C:\Users\user\Documents\edm\example.py", line 84, in main
generate_image_grid(f'{model_root}/edm-cifar10-32x32-cond-vp.pkl', 'cifar10-32x32.png', num_steps=18) # FID = 1.79, NFE = 35
File "C:\Users\user\Documents\edm\example.py", line 32, in generate_image_grid
net = pickle.load(f)['ema'].to(device)
File "C:\Users\user\AppData\Local\anaconda3\envs\edm\lib\site-packages\torch\nn\modules\module.py", line 1152, in to
return self._apply(convert)
File "C:\Users\user\AppData\Local\anaconda3\envs\edm\lib\site-packages\torch\nn\modules\module.py", line 802, in _apply
module._apply(fn)
File "C:\Users\user\AppData\Local\anaconda3\envs\edm\lib\site-packages\torch\nn\modules\module.py", line 802, in _apply
module._apply(fn)
File "C:\Users\user\AppData\Local\anaconda3\envs\edm\lib\site-packages\torch\nn\modules\module.py", line 825, in apply
param_applied = fn(param)
File "C:\Users\user\AppData\Local\anaconda3\envs\edm\lib\site-packages\torch\nn\modules\module.py", line 1150, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
File "C:\Users\user\AppData\Local\anaconda3\envs\edm\lib\site-packages\torch\cuda_init
.py", line 293, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

Any idea of how I can fix it ?

How to select options shown in Table2 for training VP/VE models

image

  • This Table 2 indicates ablation study of training components for VP/VE models in terms of performance.
  • In this table, we can choose several training configuration such as Adjust hyper parameter, Redistributed capacity, Our preconditioning, Our loss function and Non-leaky augmentation.
  • However, looking at train.py, this code provides prefix options as follows.
    image

My questions is how to choose the Adjust hyper parameter, Redistributed capacity, Our preconditioning and Our loss function options by using prefix options in train.py.

DataLoader worker (pid xxxx) is killed by signal

Hi there, thanks for sharing another good code base. Recently I keep receving the dataloader killing error when I run experiments on ImageNet dataset after certain iterations. I tried several times and it always happens after a duration (~15h).

The code run well for other datasets but only failed on ImageNet dataset. The ImageNet dataset should be fine since I have run other experiments onto it and didn't find similar issues. Not sure what happened here and hope this can be solved.

I provide the error log below.

Traceback (most recent call last):
  File "train.py", line 234, in <module>
    main()
  File "/opt/miniconda/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/opt/miniconda/lib/python3.8/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/opt/miniconda/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/miniconda/lib/python3.8/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "train.py", line 229, in main
    training_loop.training_loop(**c)
  File "/root/code/Text2Image-DiT/training/training_loop.py", line 156, in training_loop
    torch.nan_to_num(param.grad, nan=0, posinf=1e5, neginf=-1e5, out=param.grad)
  File "/opt/miniconda/lib/python3.8/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
    _error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 1680) is killed by signal: Killed.

Thanks in advance!

Different Phases of Sampling

Hi there,
So I have been working on a protein-ligand pocket diffusion model using your designs for the noising/scaling/sampling etc. For the model itself I have been using the Equiformer module released by lucidrains. My questions about the model are a different story . . . but with regards to the core diffusion process I have noticed something I wanted to inquire on. I notice during the sampling procedure that there seem to be two different 'phases'. The first phase is when sigma > 1 and the atomic coordinates are all 'coming together' as would be expected from the very distributed starting positions. But once the sigma crosses 1 on it's way to 0.002 the atoms stop in their tracks and then slowly begin stepping back. The dynamics of the Equiformer aside, it seems that once the sigma conditioning value (1/4 ln(sigma)) crosses 1 and it goes from positive to negative it almost puts the whole thing in reverse. Is this the intended behavior? Is my understanding of the process incorrect?

Question about Augmentation

Hello! I've read the Training with Limited Data paper (ADA - Adaptive Discriminator Augmentation), which was mentioned in this EDM paper that it helped with performance. In the ADA paper, the augmentation was applied to discriminators of GAN, so as to prevent the generator from producing augmented data.

I was wondering how the EDM/diffusion model learns in general not to produce augmented data in this case?

About the Checkpoint

I want to know which checkpoint in the pretrain model is ccorresponding to the model trained by EDMloss in loss.py? In Table 1 in the paper, the EDM loss is not divided into vp and ve, but in the pretrain files I can find xxxx_vp.pkl or xxxx_ve.pkl. What's the different between xxxx_vp.pkl and xxxx_ve.pkl in pretrain files (not in baselines files)?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.