GithubHelp home page GithubHelp logo

welkinyang / pfgmpp Goto Github PK

View Code? Open in Web Editor NEW

This project forked from newbeeer/pfgmpp

0.0 0.0 0.0 2.67 MB

Code for "PFGM++: Unlocking the Potential of Physics-Inspired Generative Models"

License: Other

Shell 0.12% Python 99.67% Dockerfile 0.21%

pfgmpp's Introduction

PFGM++: Unlocking the Potential of Physics-Inspired Generative Models

Pytorch implementation of the paper PFGM++: Unlocking the Potential of Physics-Inspired Generative Models

by Yilun Xu, Ziming Liu, Yonglong Tian, Shangyuan Tong, Max Tegmark, Tommi S. Jaakkola

๐Ÿ˜‡ Improvement over PFGM / Diffusion Models:

  • No longer require the large batch training target in PFGM, thus enable flexible condition generation!
  • More general $D \in \mathbb{R}^+$ dimensional augmented variable. PFGM++ subsumes PFGM and Diffusion Models: PFGM correspond to $D=1$ and Diffusion Models correspond to $D\to \infty$.
  • Existence of sweet spot $D^*$ in the middle!
  • Smaller $D$ more robust than Diffusion Models ( $D\to \infty$ )
  • Enable the adjustment for model robustness and rigidity!

Abstract: We present a general framework termed PFGM++ that unifies diffusion models and Poisson Flow Generative Models (PFGM). These models realize generative trajectories for $N$ dimensional data by embedding paths in $N{+}D$ dimensional space while still controlling the progression with a simple scalar norm of the $D$ additional variables. The new models reduce to PFGM when $D{=}1$ and to diffusion models when $D{\to}\infty$. The flexibility of choosing $D$ allows us to trade off robustness against rigidity as increasing $D$ results in more concentrated coupling between the data and the additional variable norms. We dispense with the biased large batch field targets used in PFGM and instead provide an unbiased perturbation-based objective similar to diffusion models. To explore different choices of $D$, we provide a direct alignment method for transferring well-tuned hyperparameters from diffusion models ( $D{\to} \infty$ ) to any finite $D$ values. Our experiments show that models with finite $D$ can be superior to previous state-of-the-art diffusion models on CIFAR-10/FFHQ $64{\times}64$ datasets, with FID scores of $1.91/2.43$ when $D{=}2048/128$. In addition, we demonstrate that models with smaller $D$ exhibit improved robustness against modeling errors.

schematic


Outline

Our implementation is built upon the EDM repo. We first provide an guidance on how to quickly transfer the hyperparameter from well-tuned diffusion models ( $D\to \infty$ ), such as EDM and DDPM, to the PFGM++ family ( $D\in \mathbb{R}^+$ ) in a task/dataset agnostic way (We provide more details in Sec 4 ( Transfer hyperparameters to finite $D$ ) and Appendix C.2 in our paper). We highlight our modifications based on their original command lines for training, sampling and evaluation.

We also provide the original instruction for set-ups, such as environmental requirements and dataset preparation, from EDM repo.

Transfer guidance by $r=\sigma\sqrt{D}$ formula

Below we provide the guidance for how to quick transfer the well-tuned hyperparameters for diffusion models ( $D\to \infty$ ), such as $\sigma_{\textrm{max}}$ and $p(\sigma)$ to finite $D$s. We adopt the $r=\sigma\sqrt{D}$ formula in our paper for the alignment (c.f. Section 4). Please use the following guidance as a prototype.

๐Ÿ˜€ Please adjust the augmented dimension $D$ according to your task/dataset/model.

Training hyperparameter transfer. The example we provide is a simplified version of loss.py in this repo.

schematic

'''
y: mini-batch clean images
N: data dimension
D: augmented dimension
'''

### === Diffusion Model === ###
rnd_normal = torch.randn([images.shape[0], 1, 1, 1], device=images.device)
sigma = (rnd_normal * self.P_std + self.P_mean).exp() # sample sigma from p(\sigma)
n = torch.randn_like(y) * sigma
D_yn = net(y + n, sigma)
loss = (D_yn - y) ** 2
### === Diffusion Model === ###


######## === PFGM++ === #######
rnd_normal = torch.randn(images.shape[0], device=images.device)
sigma = (rnd_normal * self.P_std + self.P_mean).exp() # sample sigma from p(\sigma)
r = sigma.double() * np.sqrt(self.D).astype(np.float64) # r=sigma\sqrt{D} formula

# = sample noise from perturbation kernel p_r = #
# Sampling form inverse-beta distribution
samples_norm = np.random.beta(a=self.N / 2., b=self.D / 2.,
                              size=images.shape[0]).astype(np.double)
inverse_beta = samples_norm / (1 - samples_norm +1e-8)
inverse_beta = torch.from_numpy(inverse_beta).to(images.device).double()
# Sampling from p_r(R) by change-of-variable (c.f. Appendix B)
samples_norm = (r * torch.sqrt(inverse_beta +1e-8)).view(len(samples_norm), -1)
# Uniformly sample the angle component
gaussian = torch.randn(images.shape[0], self.N).to(samples_norm.device)
unit_gaussian = gaussian / torch.norm(gaussian, p=2, dim=1, keepdim=True)
# Construct the perturbation 
perturbation_x = (unit_gaussian * samples_norm).float()
# = sample noise from perturbation kernel p_r = #

sigma = sigma.reshape((len(sigma), 1, 1, 1))
n = perturbation_x.view_as(y)
D_yn = net(y + n, sigma)
loss = (D_yn - y) ** 2
######## === PFGM++ === #######

Sampling hyperparameter transfer. The example we provide is a simplified version of generate.py in this repo. As shown in the figure below, the only modification is the prior sampling process. Hence we only include the comparision of prior sampling for diffusion models / PFGM++ in the code snippet.

schematic

'''
sigma_max: starting condition for diffusion models
N: data dimension
D: augmented dimension
'''

### === Diffusion Model === ###
x = torch.randn_like(data_size) * sigma_max
### === Diffusion Model === ###


######## === PFGM++ === #######
# Sampling form inverse-beta distribution
r = sigma_max * np.sqrt(self.D) # r=sigma\sqrt{D} formula
samples_norm = np.random.beta(a=self.N / 2., b=self.D / 2.,
                              size=data_size).astype(np.double)
inverse_beta = samples_norm / (1 - samples_norm +1e-8)
inverse_beta = torch.from_numpy(inverse_beta).to(images.device).double()
# Sampling from p_r(R) by change-of-variable (c.f. Appendix B)
samples_norm = (r * torch.sqrt(inverse_beta +1e-8)).view(len(samples_norm), -1)
# Uniformly sample the angle component
gaussian = torch.randn(images.shape[0], self.N).to(samples_norm.device)
unit_gaussian = gaussian / torch.norm(gaussian, p=2, dim=1, keepdim=True)
# Construct the perturbation 
x = (unit_gaussian * samples_norm).float().view(data_size)
######## === PFGM++ === #######

Please refer to Appendix C.2 for detailed hyperparameter transfer procedures from EDM and DDPMโ€‹.

Training PFGM++

You can train new models using train.py. For example:

torchrun --standalone --nproc_per_node=8 train.py --outdir=training-runs --name exp_name \
--data=datasets/cifar10-32x32.zip --cond=0 --arch=arch \
--pfgmpp=1 --batch 512 \
--aug_dim aug_dim

exp_name: name of experiments
aug_dim: D (additional dimensions)  
arch: model architectures. options: ncsnpp | ddpmpp
pfgmpp: use PFGM++ framework, otherwise diffusion models (D\to\infty case). options: 0 | 1

The above example uses the default batch size of 512 images (controlled by --batch) that is divided evenly among 8 GPUs (controlled by --nproc_per_node) to yield 64 images per GPU. Training large models may run out of GPU memory; the best way to avoid this is to limit the per-GPU batch size, e.g., --batch-gpu=32. This employs gradient accumulation to yield the same results as using full per-GPU batches. See python train.py --help for the full list of options.

The results of each training run are saved to a newly created directory training-runs/exp_name . The training loop exports network snapshots training-state-*.pt) at regular intervals (controlled by --dump). The network snapshots can be used to generate images with generate.py, and the training states can be used to resume the training later on (--resume). Other useful information is recorded in log.txt and stats.jsonl. To monitor training convergence, we recommend looking at the training loss ("Loss/loss" in stats.jsonl) as well as periodically evaluating FID for training-state-*.pt using generate.py and fid.py.

For FFHQ dataset, replacing --data=datasets/cifar10-32x32.zip with --data=datasets/ffhq-64x64.zip

Sidenote: The original EDM repo provide more dataset: FFHQ, AFHQv2, ImageNet-64. We did not test the performance of PFGM++ on these datasets due to limited computational resources. However, we believe that the some finte $D$s (sweet spots) would beat the diffusion models (the $D\to\infty$ case). Please let us know if you have those resutls ๐Ÿ˜€

TODO: All checkpoints are provided in this Google drive folder.

Generate & Evaluations

  • Generate 50k samples:

    torchrun --standalone --nproc_per_node=8 generate.py \
    --seeds=0-49999 --outdir=./training-runs/exp_name \
    --pfgmpp=1 --aug_dim=aug_dim
       
    exp_name: name of experiments
    aug_dim: D (additional dimensions)  
    arch: model architectures. options: ncsnpp | ddpmpp
    pfgmpp: use PFGM++ framework, otherwise diffusion models (D\to\infty case). options: 0 | 1

Note that the numerical value of FID varies across different random seeds and is highly sensitive to the number of images. By default, fid.py will always use 50,000 generated images; providing fewer images will result in an error, whereas providing more will use a random subset. To reduce the effect of random variation, we recommend repeating the calculation multiple times with different seeds, e.g., --seeds=0-49999, --seeds=50000-99999, and --seeds=100000-149999. In the EDM paper, they calculated each FID three times and reported the minimum.

For the FID versus controlled $\alpha$/NFE/quantization, please use generate_alpha.py/generate_steps.py/generate_quant.py for generation.

  • FID evaluation

    torchrun --standalone --nproc_per_node=8 fid.py calc --images=training-runs/exp_name --ref=fid-refs/cifar10-32x32.npz --num 50000 
    
    exp_name: name of experiments

The instructions for set-ups from EDM repo

Requirements

  • Python libraries: See environment.ymlfor exact library dependencies. You can use the following commands with Miniconda3 to create and activate your Python environment:
    • conda env create -f environment.yml -n edm
    • conda activate edm
  • Docker users:

Preparing datasets

Datasets are stored in the same format as in StyleGAN: uncompressed ZIP archives containing uncompressed PNG files and a metadata file dataset.json for labels. Custom datasets can be created from a folder containing images; see python dataset_tool.py --help for more information.

CIFAR-10: Download the CIFAR-10 python version and convert to ZIP archive:

python dataset_tool.py --source=downloads/cifar10/cifar-10-python.tar.gz \
    --dest=datasets/cifar10-32x32.zip
python fid.py ref --data=datasets/cifar10-32x32.zip --dest=fid-refs/cifar10-32x32.npz

FFHQ: Download the Flickr-Faces-HQ dataset as 1024x1024 images and convert to ZIP archive at 64x64 resolution:

python dataset_tool.py --source=downloads/ffhq/images1024x1024 \
    --dest=datasets/ffhq-64x64.zip --resolution=64x64
python fid.py ref --data=datasets/ffhq-64x64.zip --dest=fid-refs/ffhq-64x64.npz

AFHQv2: Download the updated Animal Faces-HQ dataset (afhq-v2-dataset) and convert to ZIP archive at 64x64 resolution:

python dataset_tool.py --source=downloads/afhqv2 \
    --dest=datasets/afhqv2-64x64.zip --resolution=64x64
python fid.py ref --data=datasets/afhqv2-64x64.zip --dest=fid-refs/afhqv2-64x64.npz

ImageNet: Download the ImageNet Object Localization Challenge and convert to ZIP archive at 64x64 resolution:

python dataset_tool.py --source=downloads/imagenet/ILSVRC/Data/CLS-LOC/train \
    --dest=datasets/imagenet-64x64.zip --resolution=64x64 --transform=center-crop
python fid.py ref --data=datasets/imagenet-64x64.zip --dest=fid-refs/imagenet-64x64.npz

pfgmpp's People

Contributors

hobbitlong avatar newbeeer avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.