jychoi118 / ilvr_adm Goto Github PK

View Code? Open in Web Editor NEW

385.0 5.0 50.0 320.57 MB

ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models (ICCV 2021 Oral)

License: MIT License

Python 100.00%

diffusion generation iccv2021

ilvr_adm's People

Contributors

Stargazers

Watchers

ilvr_adm's Issues

Hi, I want to run this code on my own dataset for image to image task,what should I do?

About conditioning range and downsampling factors

Hello,

I'm sorry to bother you, but could you explain in a little bit more details what are the influence of the conditioning range and the downsampling factors in the image translation task? Like what do they impact?

Questions about the training process

Hi, thank you very much for your great work. I have trained a model with class_cond as true before, and using this model to directly sample has a torch size mismatch problem. I would like to ask if there are any special requirements for parameter setting if training a diffusion model from scratch. Is it enough to keep the parameter class_cond as false and other parameters to be reasonable?

ilvr_sample hangs at the line "model_kwargs = next(data)"

Hi,

I am using the FFHQ pt file to load the checkpoint and do the sampling.

I just placed one example image in the ref_imgs/face/ folder and ran the ilvr_sample file as per the README file. But the script hangs at the line model_kwargs = next(data) .

May I know how to resolve it?

Thanks.

使用ilvr_sample.py生成的图像很模糊，包含很多噪声

Hello, I have trained an unconditional DDPM model on my own dataset using the P2-weight codebase. The model produces clear images, however when using it for ilvr, the resulting images are blurry, what is the reason for this?
Are there any hyperparameters I should adjust?

Generating images is very slow

I placed some celeba images into the fold /ref_imgs/celeba, and used the model celebahq_p2.pt to generate 100 images. It is very slow. What can I do to speed the process? Thank you

Training Requirement for GPU

Excellent work and thanks for sharing your code.
I'm a novice at diffusion models, and I'm concerned about theGPU resource this kind of models need.
I wonder can all the training be done on a single RTX3090.

mltiple gpu

thank you for your interesting work
I have a question about how to run image_train.py in multiple gpus

Question about Training

Hi!
Thanks for your work, it's really interesting. I am trying to train the diffusion model from the scratch, however, it seems the FID is unstable during training if I use the default setting of guided diffusion. I suspect that this may be caused by the fixed learning rate (this repo seems to fix the learning rate as 1e-4 during training), so I am wondering if the learning rate has decreased during your training process?

output

Hello, may I ask how to make the size of the output image consistent with the original image? The size I output now is 256*256, thanks!

Question about your paper train

Hi,
Thanks for sharing the code, I am very interested in your paper.
I just wanted to ask a few questions regarding training:
Your training is the same as guided-diffusion, only the sampling process is modified, right?

How should I adjust the hyperparameters to train a strong DDPM model based on CelebA for human face inpainting task?

I'm very grateful for your work about this github repository!!! However, I have some questions about the hyperparameters to want to ask you.
I want to use guided-diffusion to train a DDPM model based on CelebA dataset consisting of 120000 human face images (each image is 256(height)x256(width) pixels) for human face image synthesis task and then use the saved model to perform human face image inpainting task.

Could you give me some suggestions about how to adjust the hyperparameters used in guided-diffusion to train a DDPM model as strong as models provided by guided-diffusion for human face image synthesis and then inpainting task?
I wonder why the sizes of the models provided by guided-diffusion are so big, especially why the size of 256x256 diffusion (not class conditional) is so big (about 2.1GB) and how it is trained.

I know the models provided by your repository are based on P2-weighting, and P2-weighting adds two extra hyperparameters (one is p2_gamma and the another is p2_k), but I still want to use guided-diffusion to train my own DDPM model firstly because I want to learn DDPM from base and then advance, hence, I come here to ask you how to adjust the other hyperparameters used in guided-diffusion except for p2_gamma and p2_k.

I hope the trained strong DDPM model learns the features of human face well so it can be used for human face image synthesis task and human face image inpainting task (i.e. recovering the masked parts of a masked human face image).

I want to know how to adjust the values of the hyperparameters in diffusion_defaults() of script_util.py, model_and_diffusion_defaults() of script_util.py and create_argparser() of image_train.py in guided-diffusion to train a strong denoising model in DDPM based on my CelebA dataset.

I have tried some combinations of the hyperparameters used in guided-diffusion to train, however, the human face image inpainting results of the saved model files ema_0.9999_XXX.pt and modelXXX.pt are both bad.
I mainly used RePaint to perform sampling for human face image inpainting task, as described in the README of RePaint, it uesd the pretrained model celeba256_250000.pt (The model is downloaded from download.sh and based on guided-diffusion) to perform sampling for human face image inpainting task, and the size of the model is big (about 2.1GB) and its (celeba256_250000.pt) sampling results are not bad. However, I don't know why the model's size is so big and how it is trained.

In addition, because of the limitation of my GPU memory, I set the value of the hyperparameter num_channels only 64, I want to know if this hyperparameter affects the performance of the traind DDPM model. Should I try to set it larger?

In conclusion, I hope you can give me some suggestions about how to adjust the hyperparameters used in guided-diffusion to let me get a strong DDPM model for human face image inpainting task.

Thanks a lot for your any help!!!
p.s. I directly and manually set up the values of the hyperparameters in the codes of guided-diffusion not through any flag.

about requirements

thank you for you interesting work
I have a question of requirements
do you have a requirement.txt?

Error in ilvr_sample after loading my trained model

Hi,

I trained a model from scratch using my own dataset. After training I ended up with checkpoint files like ema_0.9999_000010.pt, model000010.pt and opt000010.pt.

flags used for training

python train_model.py --data_dir /data1/ --image_size 256 --num_channels 128 --num_res_blocks 3 --diffusion_steps 4000 --noise_schedule cosine --lr 1e-4 --batch_size 4 --save_dir /data2/

I used the checkpoint file ema_0.9999_000010.pt for ilvr sampling but it throwed the following error

flags used for sampling

python src/models/ILVR_GuidedDiffusion/ilvr_sample.py  --attention_resolutions 16 --class_cond False --diffusion_steps 4000 --dropout 0.0 --image_size 256 --learn_sigma True --noise_schedule cosine --num_channels 128 --num_res_blocks 1 --resblock_updown True --use_fp16 False --use_scale_shift_norm True --timestep_respacing 100 --model_path /data2/ema_0.9999_000010.pt --base_samples ref_imgs/bdd10k --down_N 32 --range_t 20 --save_dir reports/figures/guided

Error

Logging to reports/figures/guided
creating model...
Traceback (most recent call last):
  File "src/models/ILVR_GuidedDiffusion/ilvr_sample.py", line 134, in <module>
    main()
  File "src/models/ILVR_GuidedDiffusion/ilvr_sample.py", line 49, in main
    model.load_state_dict(
  File "/home/vinod/anaconda3/envs/lsgm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1482, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for UNetModel:
        Missing key(s) in state_dict: "input_blocks.2.0.in_layers.0.weight", "input_blocks.2.0.in_layers.0.bias", "input_blocks.2.0.in_layers.2.weight", "input_blocks.2.0.in_layers.2.bias", "input_blocks.2.0.emb_layers.1.weight", "input_blocks.2.0.emb_layers.1.bias", "input_blocks.2.0.out_layers.0.weight", "input_blocks.2.0.out_layers.0.bias", "input_blocks.2.0.out_layers.3.weight", "input_blocks.2.0.out_layers.3.bias", "input_blocks.4.0.in_layers.0.weight", "input_blocks.4.0.in_layers.0.bias", "input_blocks.4.0.in_layers.2.weight", "input_blocks.4.0.in_layers.2.bias", "input_blocks.4.0.emb_layers.1.weight", "input_blocks.4.0.emb_layers.1.bias", "input_blocks.4.0.out_layers.0.weight", "input_blocks.4.0.out_layers.0.bias", "input_blocks.4.0.out_layers.3.weight", "input_blocks.4.0.out_layers.3.bias", "input_blocks.6.0.in_layers.0.weight", "input_blocks.6.0.in_layers.0.bias", "input_blocks.6.0.in_layers.2.weight", "input_blocks.6.0.in_layers.2.bias", "input_blocks.6.0.emb_layers.1.weight", "input_blocks.6.0.emb_layers.1.bias", "input_blocks.6.0.out_layers.0.weight", "input_blocks.6.0.out_layers.0.bias", "input_blocks.6.0.out_layers.3.weight", "input_blocks.6.0.out_layers.3.bias", "input_blocks.8.0.in_layers.0.weight", "input_blocks.8.0.in_layers.0.bias", "input_blocks.8.0.in_layers.2.weight", "input_blocks.8.0.in_layers.2.bias", "input_blocks.8.0.emb_layers.1.weight", "input_blocks.8.0.emb_layers.1.bias", "input_blocks.8.0.out_layers.0.weight", "input_blocks.8.0.out_layers.0.bias", "input_blocks.8.0.out_layers.3.weight", "input_blocks.8.0.out_layers.3.bias", "input_blocks.10.0.in_layers.0.weight", "input_blocks.10.0.in_layers.0.bias", "input_blocks.10.0.in_layers.2.weight", "input_blocks.10.0.in_layers.2.bias", "input_blocks.10.0.emb_layers.1.weight", "input_blocks.10.0.emb_layers.1.bias", "input_blocks.10.0.out_layers.0.weight", "input_blocks.10.0.out_layers.0.bias", "input_blocks.10.0.out_layers.3.weight", "input_blocks.10.0.out_layers.3.bias", "output_blocks.1.1.in_layers.0.weight", "output_blocks.1.1.in_layers.0.bias", "output_blocks.1.1.in_layers.2.weight", "output_blocks.1.1.in_layers.2.bias", "output_blocks.1.1.emb_layers.1.weight", "output_blocks.1.1.emb_layers.1.bias", "output_blocks.1.1.out_layers.0.weight", "output_blocks.1.1.out_layers.0.bias", "output_blocks.1.1.out_layers.3.weight", "output_blocks.1.1.out_layers.3.bias", "output_blocks.3.2.in_layers.0.weight", "output_blocks.3.2.in_layers.0.bias", "output_blocks.3.2.in_layers.2.weight", "output_blocks.3.2.in_layers.2.bias", "output_blocks.3.2.emb_layers.1.weight", "output_blocks.3.2.emb_layers.1.bias", "output_blocks.3.2.out_layers.0.weight", "output_blocks.3.2.out_layers.0.bias", "output_blocks.3.2.out_layers.3.weight", "output_blocks.3.2.out_layers.3.bias", "output_blocks.5.1.in_layers.0.weight", "output_blocks.5.1.in_layers.0.bias", "output_blocks.5.1.in_layers.2.weight", "output_blocks.5.1.in_layers.2.bias", "output_blocks.5.1.emb_layers.1.weight", "output_blocks.5.1.emb_layers.1.bias", "output_blocks.5.1.out_layers.0.weight", "output_blocks.5.1.out_layers.0.bias", "output_blocks.5.1.out_layers.3.weight", "output_blocks.5.1.out_layers.3.bias", "output_blocks.7.1.in_layers.0.weight", "output_blocks.7.1.in_layers.0.bias", "output_blocks.7.1.in_layers.2.weight", "output_blocks.7.1.in_layers.2.bias", "output_blocks.7.1.emb_layers.1.weight", "output_blocks.7.1.emb_layers.1.bias", "output_blocks.7.1.out_layers.0.weight", "output_blocks.7.1.out_layers.0.bias", "output_blocks.7.1.out_layers.3.weight", "output_blocks.7.1.out_layers.3.bias", "output_blocks.9.1.in_layers.0.weight", "output_blocks.9.1.in_layers.0.bias", "output_blocks.9.1.in_layers.2.weight", "output_blocks.9.1.in_layers.2.bias", "output_blocks.9.1.emb_layers.1.weight", "output_blocks.9.1.emb_layers.1.bias", "output_blocks.9.1.out_layers.0.weight", "output_blocks.9.1.out_layers.0.bias", "output_blocks.9.1.out_layers.3.weight", "output_blocks.9.1.out_layers.3.bias". 
        Unexpected key(s) in state_dict: "input_blocks.2.0.op.weight", "input_blocks.2.0.op.bias", "input_blocks.4.0.op.weight", "input_blocks.4.0.op.bias", "input_blocks.6.0.op.weight", "input_blocks.6.0.op.bias", "input_blocks.8.0.op.weight", "input_blocks.8.0.op.bias", "input_blocks.10.0.op.weight", "input_blocks.10.0.op.bias", "input_blocks.11.1.norm.weight", "input_blocks.11.1.norm.bias", "input_blocks.11.1.qkv.weight", "input_blocks.11.1.qkv.bias", "input_blocks.11.1.proj_out.weight", "input_blocks.11.1.proj_out.bias", "output_blocks.0.1.norm.weight", "output_blocks.0.1.norm.bias", "output_blocks.0.1.qkv.weight", "output_blocks.0.1.qkv.bias", "output_blocks.0.1.proj_out.weight", "output_blocks.0.1.proj_out.bias", "output_blocks.1.2.conv.weight", "output_blocks.1.2.conv.bias", "output_blocks.1.1.norm.weight", "output_blocks.1.1.norm.bias", "output_blocks.1.1.qkv.weight", "output_blocks.1.1.qkv.bias", "output_blocks.1.1.proj_out.weight", "output_blocks.1.1.proj_out.bias", "output_blocks.3.2.conv.weight", "output_blocks.3.2.conv.bias", "output_blocks.5.1.conv.weight", "output_blocks.5.1.conv.bias", "output_blocks.7.1.conv.weight", "output_blocks.7.1.conv.bias", "output_blocks.9.1.conv.weight", "output_blocks.9.1.conv.bias". 
        size mismatch for out.2.weight: copying a param with shape torch.Size([3, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([6, 128, 3, 3]).
        size mismatch for out.2.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([6]).

May I know how you generated the 256x256 FFHQ.pt and
256x256 AFHQ-dog.pt because I don't fave any issues while loading these weights?

Questions about style transfer

Hi teams!

Thanks for sharing your code and excellent work at first! I have tried your sampling code on my GPUs and got great outputs. I'm wondering whether your model can produce the outputs, which has the same texture shape of inputs but get colorized as reference pictures style? I have tried big N (e.g.,N=64) as you mentioned in the paper which can preserve only the coarse aspect (e.g., color scheme) of the reference but didn't get a good performance. So could I ask you any ideas to solve this problem?

(What I have done so far: I used image_train.py and domain A datasets(256x256) to train the model ema_0.9999_010000.pt. And then use some 256x256 images in domain B with big N to generate outputs.)

Thanks in advance!

Dataset used for training ffhq_10m.pt

Thanks for your amazing work and nice sharing. You've said that 10M images are used to train ffhq_10m.pt in your REAME. As far as I know, there are 70000 images in FFHQ dataset. It is a little confusing. So is there any other data I've missed?

Nice work! Can ILVR be used for Distillation related work?

Possible errors about using q_sample(x, t)

Thanks for your impressive contribution !
You implemented y_{t-1} by using the method q_sample(y, t). However, the right way is q_sample(y, t-1). Is there exisits any errors here?

DDIM sampling

Thanks for the awesome work.
Whether I can use ILVR sampling based on the ddim sampling scheme? I try to use ddim sampling scheme with ILVR, but the result will be very blury.

FID test settings

It's mentioned that 50K real images are used for FID scores in Table 1. However, there are 70K images in FFHQ and only 1K in METFACES. Could you clarify how the 50K are sampled and or duplicated? Thanks

Question about your paper reproduction

Hi, I am a beginner in machine learning and I enjoyed reading your paper and I tried to reproduce some of your results.

However, after implementing ilvr_sample code, I keep get stuck when loading pre-trained model to the code. (there are some size mismatchs for U-net model)

I have not modified any of your codes on github.

I tried with model_path of models/ffhq_10m.pt and base_samples of ref_imgs/face.

Did this problem happen due to my poor skill?

I am sorry for asking this, but could you please check if the uploaded codes work well without any modification?

Thank you for reading.

These are the error messages I got.

Logging to ./output
creating model...
Traceback (most recent call last):
File "ilvr_sample.py", line 126, in
main()
File "ilvr_sample.py", line 50, in main
dist_util.load_state_dict(args.model_path, map_location="cpu")
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1483, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for UNetModel:
Missing key(s) in state_dict: "input_blocks.3.0.op.weight", "input_blocks.3.0.op.bias", "input_blocks.4.0.skip_connection.weight", "input_blocks.4.0.skip_connection.bias", "input_blocks.6.0.op.weight", "input_blocks.6.0.op.bias", "input_blocks.7.0.skip_connection.weight", "input_blocks.7.0.skip_connection.bias", "input_blocks.7.1.norm.weight", "input_blocks.7.1.norm.bias", "input_blocks.7.1.qkv.weight", "input_blocks.7.1.qkv.bias", "input_blocks.7.1.proj_out.weight", "input_blocks.7.1.proj_out.bias", "input_blocks.8.1.norm.weight", "input_blocks.8.1.norm.bias", "input_blocks.8.1.qkv.weight", "input_blocks.8.1.qkv.bias", "input_blocks.8.1.proj_out.weight", "input_blocks.8.1.proj_out.bias", "input_blocks.9.0.op.weight", "input_blocks.9.0.op.bias", "input_blocks.10.0.skip_connection.weight", "input_blocks.10.0.skip_connection.bias", "input_blocks.10.1.norm.weight", "input_blocks.10.1.norm.bias", "input_blocks.10.1.qkv.weight", "input_blocks.10.1.qkv.bias", "input_blocks.10.1.proj_out.weight", "input_blocks.10.1.proj_out.bias", "input_blocks.11.1.norm.weight", "input_blocks.11.1.norm.bias", "input_blocks.11.1.qkv.weight", "input_blocks.11.1.qkv.bias", "input_blocks.11.1.proj_out.weight", "input_blocks.11.1.proj_out.bias", "output_blocks.0.1.norm.weight", "output_blocks.0.1.norm.bias", "output_blocks.0.1.qkv.weight", "output_blocks.0.1.qkv.bias", "output_blocks.0.1.proj_out.weight", "output_blocks.0.1.proj_out.bias", "output_blocks.1.1.norm.weight", "output_blocks.1.1.norm.bias", "output_blocks.1.1.qkv.weight", "output_blocks.1.1.qkv.bias", "output_blocks.1.1.proj_out.weight", "output_blocks.1.1.proj_out.bias", "output_blocks.2.2.conv.weight", "output_blocks.2.2.conv.bias", "output_blocks.4.1.norm.weight", "output_blocks.4.1.norm.bias", "output_blocks.4.1.qkv.weight", "output_blocks.4.1.qkv.bias", "output_blocks.4.1.proj_out.weight", "output_blocks.4.1.proj_out.bias", "output_blocks.5.1.norm.weight", "output_blocks.5.1.norm.bias", "output_blocks.5.1.qkv.weight", "output_blocks.5.1.qkv.bias", "output_blocks.5.1.proj_out.weight", "output_blocks.5.1.proj_out.bias", "output_blocks.5.2.conv.weight", "output_blocks.5.2.conv.bias", "output_blocks.8.1.conv.weight", "output_blocks.8.1.conv.bias".
Unexpected key(s) in state_dict: "input_blocks.3.0.in_layers.0.weight", "input_blocks.3.0.in_layers.0.bias", "input_blocks.3.0.in_layers.2.weight", "input_blocks.3.0.in_layers.2.bias", "input_blocks.3.0.emb_layers.1.weight", "input_blocks.3.0.emb_layers.1.bias", "input_blocks.3.0.out_layers.0.weight", "input_blocks.3.0.out_layers.0.bias", "input_blocks.3.0.out_layers.3.weight", "input_blocks.3.0.out_layers.3.bias", "input_blocks.5.0.skip_connection.weight", "input_blocks.5.0.skip_connection.bias", "input_blocks.6.0.in_layers.0.weight", "input_blocks.6.0.in_layers.0.bias", "input_blocks.6.0.in_layers.2.weight", "input_blocks.6.0.in_layers.2.bias", "input_blocks.6.0.emb_layers.1.weight", "input_blocks.6.0.emb_layers.1.bias", "input_blocks.6.0.out_layers.0.weight", "input_blocks.6.0.out_layers.0.bias", "input_blocks.6.0.out_layers.3.weight", "input_blocks.6.0.out_layers.3.bias", "input_blocks.9.1.norm.weight", "input_blocks.9.1.norm.bias", "input_blocks.9.1.qkv.weight", "input_blocks.9.1.qkv.bias", "input_blocks.9.1.proj_out.weight", "input_blocks.9.1.proj_out.bias", "input_blocks.9.0.in_layers.0.weight", "input_blocks.9.0.in_layers.0.bias", "input_blocks.9.0.in_layers.2.weight", "input_blocks.9.0.in_layers.2.bias", "input_blocks.9.0.emb_layers.1.weight", "input_blocks.9.0.emb_layers.1.bias", "input_blocks.9.0.out_layers.0.weight", "input_blocks.9.0.out_layers.0.bias", "input_blocks.9.0.out_layers.3.weight", "input_blocks.9.0.out_layers.3.bias", "input_blocks.9.0.skip_connection.weight", "input_blocks.9.0.skip_connection.bias", "output_blocks.1.1.in_layers.0.weight", "output_blocks.1.1.in_layers.0.bias", "output_blocks.1.1.in_layers.2.weight", "output_blocks.1.1.in_layers.2.bias", "output_blocks.1.1.emb_layers.1.weight", "output_blocks.1.1.emb_layers.1.bias", "output_blocks.1.1.out_layers.0.weight", "output_blocks.1.1.out_layers.0.bias", "output_blocks.1.1.out_layers.3.weight", "output_blocks.1.1.out_layers.3.bias", "output_blocks.3.2.in_layers.0.weight", "output_blocks.3.2.in_layers.0.bias", "output_blocks.3.2.in_layers.2.weight", "output_blocks.3.2.in_layers.2.bias", "output_blocks.3.2.emb_layers.1.weight", "output_blocks.3.2.emb_layers.1.bias", "output_blocks.3.2.out_layers.0.weight", "output_blocks.3.2.out_layers.0.bias", "output_blocks.3.2.out_layers.3.weight", "output_blocks.3.2.out_layers.3.bias", "output_blocks.5.1.in_layers.0.weight", "output_blocks.5.1.in_layers.0.bias", "output_blocks.5.1.in_layers.2.weight", "output_blocks.5.1.in_layers.2.bias", "output_blocks.5.1.emb_layers.1.weight", "output_blocks.5.1.emb_layers.1.bias", "output_blocks.5.1.out_layers.0.weight", "output_blocks.5.1.out_layers.0.bias", "output_blocks.5.1.out_layers.3.weight", "output_blocks.5.1.out_layers.3.bias", "output_blocks.7.1.in_layers.0.weight", "output_blocks.7.1.in_layers.0.bias", "output_blocks.7.1.in_layers.2.weight", "output_blocks.7.1.in_layers.2.bias", "output_blocks.7.1.emb_layers.1.weight", "output_blocks.7.1.emb_layers.1.bias", "output_blocks.7.1.out_layers.0.weight", "output_blocks.7.1.out_layers.0.bias", "output_blocks.7.1.out_layers.3.weight", "output_blocks.7.1.out_layers.3.bias", "output_blocks.9.1.in_layers.0.weight", "output_blocks.9.1.in_layers.0.bias", "output_blocks.9.1.in_layers.2.weight", "output_blocks.9.1.in_layers.2.bias", "output_blocks.9.1.emb_layers.1.weight", "output_blocks.9.1.emb_layers.1.bias", "output_blocks.9.1.out_layers.0.weight", "output_blocks.9.1.out_layers.0.bias", "output_blocks.9.1.out_layers.3.weight", "output_blocks.9.1.out_layers.3.bias".
size mismatch for input_blocks.4.0.in_layers.2.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 128, 3, 3]).
size mismatch for input_blocks.4.0.in_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.4.0.emb_layers.1.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for input_blocks.4.0.emb_layers.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for input_blocks.4.0.out_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.4.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.4.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
size mismatch for input_blocks.4.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.5.0.in_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.5.0.in_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for input_blocks.5.0.in_layers.2.weight: copying a param with shape torch.Size([256, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
size mismatch for input_blocks.7.0.in_layers.2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([384, 256, 3, 3]).
size mismatch for input_blocks.7.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for input_blocks.7.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 512]).
size mismatch for input_blocks.7.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for input_blocks.7.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for input_blocks.7.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for input_blocks.7.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([384, 384, 3, 3]).
size mismatch for input_blocks.7.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for input_blocks.8.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for input_blocks.8.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for input_blocks.8.0.in_layers.2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([384, 384, 3, 3]).
size mismatch for input_blocks.8.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for input_blocks.8.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 512]).
size mismatch for input_blocks.8.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for input_blocks.8.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for input_blocks.8.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for input_blocks.8.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([384, 384, 3, 3]).
size mismatch for input_blocks.8.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for input_blocks.10.0.in_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for input_blocks.10.0.in_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for input_blocks.10.0.in_layers.2.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 384, 3, 3]).
size mismatch for output_blocks.2.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([896]).
size mismatch for output_blocks.2.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([896]).
size mismatch for output_blocks.2.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 896, 3, 3]).
size mismatch for output_blocks.2.0.skip_connection.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 896, 1, 1]).
size mismatch for output_blocks.3.0.in_layers.0.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([896]).
size mismatch for output_blocks.3.0.in_layers.0.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([896]).
size mismatch for output_blocks.3.0.in_layers.2.weight: copying a param with shape torch.Size([512, 768, 3, 3]) from checkpoint, the shape in current model is torch.Size([384, 896, 3, 3]).
size mismatch for output_blocks.3.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for output_blocks.3.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([768, 512]).
size mismatch for output_blocks.3.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for output_blocks.3.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for output_blocks.3.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for output_blocks.3.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([384, 384, 3, 3]).
size mismatch for output_blocks.3.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for output_blocks.3.0.skip_connection.weight: copying a param with shape torch.Size([512, 768, 1, 1]) from checkpoint, the shape in current model is torch.Size([384, 896, 1, 1]).
size mismatch for output_blocks.3.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for output_blocks.3.1.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for output_blocks.3.1.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for output_blocks.3.1.qkv.weight: copying a param with shape torch.Size([1536, 512, 1]) from checkpoint, the shape in current model is torch.Size([1152, 384, 1]).
size mismatch for output_blocks.3.1.qkv.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([1152]).
size mismatch for output_blocks.3.1.proj_out.weight: copying a param with shape torch.Size([512, 512, 1]) from checkpoint, the shape in current model is torch.Size([384, 384, 1]).
size mismatch for output_blocks.3.1.proj_out.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for output_blocks.4.0.in_layers.2.weight: copying a param with shape torch.Size([256, 768, 3, 3]) from checkpoint, the shape in current model is torch.Size([384, 768, 3, 3]).
size mismatch for output_blocks.4.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for output_blocks.4.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 512]).
size mismatch for output_blocks.4.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for output_blocks.4.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for output_blocks.4.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for output_blocks.4.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([384, 384, 3, 3]).
size mismatch for output_blocks.4.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for output_blocks.4.0.skip_connection.weight: copying a param with shape torch.Size([256, 768, 1, 1]) from checkpoint, the shape in current model is torch.Size([384, 768, 1, 1]).
size mismatch for output_blocks.4.0.skip_connection.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for output_blocks.5.0.in_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([640]).
size mismatch for output_blocks.5.0.in_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([640]).
size mismatch for output_blocks.5.0.in_layers.2.weight: copying a param with shape torch.Size([256, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([384, 640, 3, 3]).
size mismatch for output_blocks.5.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for output_blocks.5.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 512]).
size mismatch for output_blocks.5.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for output_blocks.5.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for output_blocks.5.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for output_blocks.5.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([384, 384, 3, 3]).
size mismatch for output_blocks.5.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for output_blocks.5.0.skip_connection.weight: copying a param with shape torch.Size([256, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([384, 640, 1, 1]).
size mismatch for output_blocks.5.0.skip_connection.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for output_blocks.6.0.in_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([640]).
size mismatch for output_blocks.6.0.in_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([640]).
size mismatch for output_blocks.6.0.in_layers.2.weight: copying a param with shape torch.Size([256, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 640, 3, 3]).
size mismatch for output_blocks.6.0.skip_connection.weight: copying a param with shape torch.Size([256, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 640, 1, 1]).
size mismatch for output_blocks.7.0.in_layers.0.weight: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for output_blocks.7.0.in_layers.0.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for output_blocks.7.0.in_layers.2.weight: copying a param with shape torch.Size([256, 384, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 512, 3, 3]).
size mismatch for output_blocks.7.0.skip_connection.weight: copying a param with shape torch.Size([256, 384, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 512, 1, 1]).
size mismatch for output_blocks.8.0.in_layers.2.weight: copying a param with shape torch.Size([128, 384, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 384, 3, 3]).
size mismatch for output_blocks.8.0.in_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for output_blocks.8.0.emb_layers.1.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([512, 512]).
size mismatch for output_blocks.8.0.emb_layers.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for output_blocks.8.0.out_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for output_blocks.8.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for output_blocks.8.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
size mismatch for output_blocks.8.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for output_blocks.8.0.skip_connection.weight: copying a param with shape torch.Size([128, 384, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 384, 1, 1]).
size mismatch for output_blocks.8.0.skip_connection.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for output_blocks.9.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for output_blocks.9.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for output_blocks.9.0.in_layers.2.weight: copying a param with shape torch.Size([128, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 384, 3, 3]).
size mismatch for output_blocks.9.0.skip_connection.weight: copying a param with shape torch.Size([128, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 384, 1, 1]).
size mismatch for out.2.weight: copying a param with shape torch.Size([6, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([3, 128, 3, 3]).
size mismatch for out.2.bias: copying a param with shape torch.Size([6]) from checkpoint, the shape in current model is torch.Size([3]).
[565458df4300:00904] *** Process received signal ***
[565458df4300:00904] Signal: Segmentation fault (11)
[565458df4300:00904] Signal code: Address not mapped (1)
[565458df4300:00904] Failing at address: 0x7ff91ee2320d
[565458df4300:00904] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12980)[0x7ff921acd980]
[565458df4300:00904] [ 1] /lib/x86_64-linux-gnu/libc.so.6(getenv+0xa5)[0x7ff92170c8a5]
[565458df4300:00904] [ 2] /usr/lib/x86_64-linux-gnu/libtcmalloc.so.4(_ZN13TCMallocGuardD1Ev+0x34)[0x7ff921f77e44]
[565458df4300:00904] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__cxa_finalize+0xf5)[0x7ff92170d735]
[565458df4300:00904] [ 4] /usr/lib/x86_64-linux-gnu/libtcmalloc.so.4(+0x13cb3)[0x7ff921f75cb3]
[565458df4300:00904] *** End of error message ***

About image translation

I see you mentioned the model is applicable to image translation in the paper. How to implement it?
Thanks

The inference process is very slow

I run the default scripts, however, 100 generated samples take about 15 hours, is there any method to accelarate the process?

Some issues about your scripts

I run the following command:
python -m ipdb scripts/ilvr_sample.py
--attention_resolutions 16
--class_cond False
--diffusion_steps 100
--dropout 0.0
--image_size 256
--learn_sigma True
--noise_schedule linear
--num_channels 128
--num_head_channels 64
--num_res_blocks 1
--resblock_updown True
--use_fp16 False
--use_scale_shift_norm True
--timestep_respacing 100
--model_path models/ffhq_10m.pt
--base_samples ref_imgs/face
--down_N 32
--range_t 10
--save_dir output0
However, there is nothing output in 30 minutes, and there is no error output. What's the problem ?

Checkpoint can not access

Hi,

I am trying to download the checkpoints of your model but the link can not visit.

I have found the links in issues, but they all do not work.

Could you provide other available access links?

Thanks a lot.

Checkpoints no longer available!

Hi,

I am trying to download the checkpoints of your model but the link is no longer available, as it says that the file does not exist.

About super_res_sample

What is the base_samples?

How to make pre-trained model myself?

Hi
I want to adapt your model to another dataset. How can I solve it?

Confusing about timesteps

Thanks for the fantastic work.
In the paper, you said that trained unconditional DDPM with publicly available PyTorch implementation(https://github.com/rosinality/ denoising-diffusion-pytorch) has 1000 timesteps during the training stage. But during the inference stage, directly use 100 steps without respacing. How does it work? Is there a problem with my understanding, and no relevant explanation is found in the paper.

Why not use L2 distance gradient?

Qurious as to why you use
$\phi(y_{t-1}) - \phi(x_{t-1})$ and not use
$\nabla_{x_{t-1}}||\phi(y_{t-1}) - \phi(x_{t-1})||_2^2$ similarly to classifier guidance?

Adaption of pre-trained model

Hi! Thank you for your interesting repo!

I have a question about training using pre-trained model of "guided diffusion".

Is it ok to put other pre-trained model in ./models directory?

Can this model be applied to other data modalities?

Hi, authors. Can this model be applied to other data modalities? such as audio, text,...
Have you tried it?
Hope that you can give me some suggestions. Thanks in advance!

How to sample results on LSUN dataset?

Hi,
Thanks for your great job on diffusion model!
I am wondering if I can download the pre-train model on LSUN from improved-diffusion
to sample in your code? I find there are some mismatching when loading the parameters. Have you modified the code base?
Thank you.

Questions regarding evaluation

Hi,

Thanks for sharing the codebase and the pretrained models! Your work is very impressive :)
I just wanted to ask a few questions regarding quantitative evaluation:

How many images were used for computing the FID scores in Table 1 of the main paper?
How are the reference images that guide the generation process chosen?
What codebase did you use to compute the FID scores?

Regards,
Seth

Checkpoints no longer available

Hi,

I am trying to download the checkpoints of your model but the link is no longer available, as it says that the file does not exist. I attach here the message from Google Drive.

Could you provide a new link or the checkpoints?

Thanks in advance.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.