openai / consistency_models Goto Github PK
View Code? Open in Web Editor NEWOfficial repo for consistency models.
License: MIT License
Official repo for consistency models.
License: MIT License
When I use diffusers to sample images with offered code, I can't get an image that corresponds to the label at all.
Hi, thanks for the great work.
One question here, it seems that the function here below is different from Algorithm 1 in the paper. Any further clarifications on multistep sampling? Thanks.
https://github.com/openai/consistency_models/blob/main/cm/karras_diffusion.py#L657
Hello! Your work is truly remarkable. I've had the opportunity to experiment with standard DDPM models using DDIM sampling, and despite being relatively slow, they exhibit a fascinating reversibility property. This allows denoising, as well as the reverse process, recreating the exact same input noise (or close).
I am seeing that there is two samplers being used here: sample_onestep
and stochastic_iterative_sampler
.
stochastic_iterative_sampler
adds stochastic noise between each steps so is obviously not reversible.
Not sure about sample_onestep
.
I would be grateful if you could provide some insights into whether this reversibility property is preserved in Consistency models sampling. The examples provided do not explicitly demonstrate this aspect, and I am eager to learn more about it. Thank you for your time and expertise.
I want to directly use CM to get the whole trajectory of xT -> x0, like an ODE sovler, is it possible?
its a problem with flash-attn and got this error
Installing build dependencies ... done
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
Getting requirements to build wheel ... error
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
i tried reinstalling wheel and nothing changed.
The current flash attention module by Hazy Research is Cuda-only, so this limits this repo to Cuda-only too. I suggest writing a separate flash attention module for computers without Nvidia video card. This current module can still be used if Nvidia card is present.
A couple of people have raised this issue with Hazy Research, but they said they're focused on Cuda only, and are not interested in writing a non-Cuda version.
Would it be possible to release the reference batch for CIFAR10? The reference batches for ImageNet/LSUN are available here, but I could not find a corresponding CIFAR10 batch.
Thanks!
Reduce total_training_steps.Then,Why does the training not stop when the steps are reduced
Hi,
Thanks for open-sourcing this wonderful project!
However, I notice in CT training, the loss term has the target model denoising for
OS: Ubuntu20.04 x86_64
Python: 3.11.5
flash-attn: 2.3.6/2.1.2.post3
I use pip to install flash-attn module and the version is 2.3.6.When I run python cm_train
to test, I find python throw this error.I change my flash-attn version to 2.1.2.post3,but the error is still exist.
The following is my output:
raceback (most recent call last): File "/root/autodl-tmp/conmodel/scripts/cm_train.py", line 178, in <module> main() File "/root/autodl-tmp/conmodel/scripts/cm_train.py", line 54, in main model, diffusion = create_model_and_diffusion(**model_and_diffusion_kwargs) # 创建model和diffusion ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/autodl-tmp/conmodel/scripts/../cm/script_util.py", line 76, in create_model_and_diffusion model = create_model( ^^^^^^^^^^^^^ File "/root/autodl-tmp/conmodel/scripts/../cm/script_util.py", line 140, in create_model return UNetModel( ^^^^^^^^^^ File "/root/autodl-tmp/conmodel/scripts/../cm/unet.py", line 612, in __init__ AttentionBlock( File "/root/autodl-tmp/conmodel/scripts/../cm/unet.py", line 293, in __init__ self.attention = QKVFlashAttention(channels, self.num_heads) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/autodl-tmp/conmodel/scripts/../cm/unet.py", line 344, in __init__ from flash_attn.flash_attention import FlashAttention ModuleNotFoundError: No module named 'flash_attn.flash_attention'
I guess 'flash_attention' may change to elsewhere during update.
A really exciting work!
I wonder if it could be implemented in stable diffusion.
Is it planned to release the code for the Zero-Shot Image Editing?
In particular, I'd be interested in the super-resolution.
Best regards
I tried to train a CT-imagenet model from scratch, and have not been able to reproduce the same quality of images as the checkpoint models provided.
Below is the command I ran:
python cm_train.py --training_mode consistency_training --target_ema_mode adaptive --start_ema 0.95 --scale_mode progressive --start_scales 2 --end_scales 200 --total_training_steps 800000 --loss_norm lpips --lr_anneal_steps 0 --attention_resolutions 32,16,8 --class_cond True --use_scale_shift_norm True --dropout 0.0 --teacher_dropout 0.1 --ema_rate 0.999 --global_batch_size 16 --image_size 64 --lr 0.0001 --num_channels 192 --num_head_channels 64 --num_res_blocks 3 --resblock_updown True --schedule_sampler uniform --use_fp16 True --weight_decay 0.0 --weight_schedule uniform --data_dir /home/ILSVRC2012_img_train
My 3090 can only handle a batch size of 16. Has anyone tried training their own model from scratch?
This command is python image_sample.py --batch_size 8 --training_mode consistency_distillation --sampler multistep --ts 0,67,150 --steps 151 --model_path E:\Googledownload\ct_bedroom256.pt --attention_resolutions 32,16,8 --class_cond False --use_scale_shift_norm False --dropout 0.0 --image_size 256 --num_channels 256 --num_head_channels 64 --num_res_blocks 2 --num_samples 500 --resblock_updown True --use_fp16 True --weight_schedule uniform。
The resulting effect is shown in Fig.
I don't know why this happens, is there any good way, thanks.
mpiexec -n 8 python scripts/image_sample.py --batch_size 32 --training_mode consistency_distillation --sampler multistep --ts 0,62,150 --steps 151 --model_path ./ct_cat256.pt --attention_resolutions 32,16,8 --class_cond False --use_scale_shift_norm False --dropout 0.0 --image_size 256 --num_channels 256 --num_head_channels 64 --num_res_blocks 2 --num_samples 500 --resblock_updown True --use_fp16 True --weight_schedule uniform
"home/anaconda3/envs/consistency/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 2433, in all_gather
work = default_pg.allgather([tensor_list], [tensor])
torch.distributed.DistBackendError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1275, internal error, NCCL version 2.14.3
ncclInternalError: Internal check failed.
Last error:
Cuda failure 'peer access is not supported between these two devices'
Hi, I'm currently trying to train the LSUN model(256x256). When using a 8xA100(80G) machine, however, I can only set the global the batch size of 64 instead of 256 mentioned in the paper using the default model architecture. I'm wondering if I'm missing something here. Thank you in advance for the help!
I tried to generate samples in Colab and everything works except that I had to change this line of code in /cm/unet.py, clearing out factory_kwargs.
Not sure if this is a bug or I did something wrong. This is how I ran it: https://github.com/JonathanFly/consistency_models_colab_notebook/blob/main/Consistency_Models_Make_Samples.ipynb
class QKVFlashAttention(nn.Module):
def __init__(
self,
embed_dim,
num_heads,
batch_first=True,
attention_dropout=0.0,
causal=False,
device=None,
dtype=None,
**kwargs,
) -> None:
from einops import rearrange
from flash_attn.flash_attention import FlashAttention
assert batch_first
#factory_kwargs = {"device": device, "dtype": dtype}
factory_kwargs = {}
super().__init__()
self.embed_dim = embed_dim
self.num_heads = num_heads
self.causal = causal
Error: RuntimeError: shape '[-1, 7, 3, 256, 256]' is invalid for input of size 6291456
In
consistency_models/cm/karras_diffusion.py
Line 806 in e32b69e
mask = th.zeros(*x.shape, device=dist_util.dev())
mask = mask.reshape(-1, 7, 3, image_size, image_size) <-- why to reshape to 7???!!!!
x.shape
is [batch_size, 3, 256, 256] in my code.
Is that a bug?
In Algorithm 4 in the paper, they described A as an invertible linear transformation, that maps images to the latent space.
I cannot identify any transformation to the latent space.
Thank you for your wonderful work! Could you please share the number of GPU hours required to train the CD and CT models on ImageNet64X64?
I only have one gpu, and I want to successfully run the pre-trained model and generate images, what should I do. Where should the code be changed? Please explain in detail, because I am a newcomera in this regard. Thanks.
We get "RuntimeError: Expected tensor for argument # 1 'indices' to have scalar type Long; but got torch.IntTensor instead (while checking arguments for embedding)"
when we run
import torch
from diffusers import ConsistencyModelPipeline
device = "cuda"
# Load the cd_imagenet64_l2 checkpoint.
model_id_or_path = "openai/diffusers-cd_imagenet64_l2"
pipe = ConsistencyModelPipeline.from_pretrained(model_id_or_path, torch_dtype=torch.float16)
pipe.to(device)
# Onestep sampling, class-conditional image generation
# ImageNet-64 class label 145 corresponds to king penguins
image = pipe(num_inference_steps=1, class_labels=145).images[0]
image.save("cd_imagenet64_l2_onestep_sample_penguin.png")
# Multistep sampling, class-conditional image generation
# Timesteps can be explicitly specified; the particular timesteps below are from the original Github repo:
# https://github.com/openai/consistency_models/blob/main/scripts/launch.sh#L77
image = pipe(num_inference_steps=None, timesteps=[22, 0], class_labels=145).images[0]
image.save("cd_imagenet64_l2_multistep_sample_penguin.png")
Corrected in Pull requests #43
Hi, I wonder where can we get the ref_batches/imagenet64.npz file for FID evaluation
I had downloaded the data set of LSUN bedroom and run the Consistency Training on LSUN-256 command in cm/scripts/launch.sh in terminal. No additional message is showed in the window. The screenshot is attached below:
Is it normal? Or maybe I need to wait for a long period of time until the training process is finished, and then some messages will show up.
ImportError: cannot import name '_get_cpp_backtrace' from 'torch._C' (/opt/conda/lib/python3.8/site-packages/torch/_C.cpython-38-x86_64-linux-gnu.so)
Hi, Thanks for your interesting work first!
I am confused about the environment when running the code, and want to know the versions of python and torch used in your project.
Thanks very much!
Is it not possible to specify a corresponding image as supervisory information for the generated image during the generation process in the model code that was released? Please reply to me with relevant information. Thank you!
When setting use_fp16=False
, the code functions correctly. However, an issue arises with use_fp16=True
due to an unexpected type conversion in unet.py
(line435).
The problem occurs at line 435, where the tensor a
is converted from float16
to float32
:
a = a.float()
Prior to this line, a
is in float16
, but after this line, it is converted to float32
. If we remove or comment out this line, the code encounters an error. It seems that maintaining a
in float16
is essential for the use_fp16=True
setting to work correctly, but the current implementation inadvertently converts it to float32
, leading to issues.
Additionally, I've noticed that the current code has been modified to prevent the utilization of flash attention. I also attempted to run the original version, but encountered similar errors.
Is it possible to provide any explanation about the lg_loss_scale?
Hi,
How can I use the provided repository for image-conditional training and evaluation (super-resolution)?
And which approach is recommended for such a task? consistency-distillation? or consistency-training?
(Assuming the base model for the consistency-distillation originates from: "Image Super-Resolution via Iterative Refinement" )
Thanks!
I've been exploring the Consistency model and am intrigued by its approach to maintaining consistency across the ODE path. The objective function appears to ensure that two specific points along the ODE path are consistent after being processed by f(·), and it also constrains the outcome of the boundary ε samples through f(·). However, I'm curious about how the model ensures that the result of processing any point along the ODE path by f(·) matches the ε samples.
Could anyone provide insights or explain the underlying mechanism that guarantees this level of consistency across the entire ODE path? Any further clarification or pointers to additional resources would be greatly appreciated.
Hi,
I've emailed Yang Song about differences between paper and code, but I thought I'd raise it as an issue so others can see. I'll update this issue if I get a reply via email.
There are some differences between the paper and the code, and I was hoping to know which is the better approach.
c_in
.Any advice would be really appreciated, thank you.
Hi,
Anyone knows the GPU requirement to train this network (Consistency Distillation)? I'm always getting CUDA out of memory errors:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.00 GiB (GPU 0; 39.42 GiB total capacity; 34.41 GiB already allocated; 2.96 GiB free; 34.90 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
I've tried to decrease the batch size, the memory requirement in "Tried to allocate 4.00 GiB" do decrease, but such error still occurs as the "already allocated" data usage increases.
It would be great to see how this consistency model performs on face datasets such as CelebA-HQ and FFHQ, since the paper didn't mention the face dataset.
When I execute pip install -e . as speficied in README.md but I encount below error.
Collecting flash-attn (from consistency-models==0.0.0)
Downloading flash_attn-1.0.4.tar.gz (2.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 8.6 MB/s eta 0:00:00
Installing build dependencies ... done
Getting requirements to build wheel ... error
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> [17 lines of output]
Traceback (most recent call last):
File "/workspace/python/consistency_models/venv/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
main()
File "/workspace/python/consistency_models/venv/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/workspace/python/consistency_models/venv/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
return hook(config_settings)
File "/tmp/pip-build-env-9dhpu9sr/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 341, in get_requires_for_build_wheel
return self._get_build_requires(config_settings, requirements=['wheel'])
File "/tmp/pip-build-env-9dhpu9sr/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 323, in _get_build_requires
self.run_setup()
File "/tmp/pip-build-env-9dhpu9sr/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 487, in run_setup
super(_BuildMetaLegacyBackend,
File "/tmp/pip-build-env-9dhpu9sr/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 338, in run_setup
exec(code, locals())
File "<string>", line 6, in <module>
ModuleNotFoundError: No module named 'packaging'
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.
I try to install packaging manually, but the situation is not solved.
I am using venv python virtual enviroment.
Hi,
thank you for your great work on this project!
I have been comparing the noise schedule implementation between this project and EDM, and I noticed some odd differences.
In particular, I was looking at this code snippet:
consistency_models/cm/script_util.py
Lines 94 to 101 in edfe91e
As the title
As the title, can I use the .pkl provided in https://github.com/NVlabs/edm as the teacher model to train consistency model?
ModuleNotFoundError: No module named 'cm'
How to solve this problem when importing classes from cm package? Does it mean that the project have to be built first?
Thanks a lot for your outstanding work in Consistency Model!
We are attempting to utilize the one-step generation process of consistency model to achieve some downstream tasks. However, it seems that only conditional generation model is provided on ImageNet
Could you please further provide the unconditional model on ImageNet
writen In launch.sh
section "Consistency training on class-conditional ImageNet-64, and LSUN 256"
mpiexec -n 8 python cm_train.py --training_mode consistency_training ... --teacher_model_path /path/to/edm_bedroom256_ema.pt ...
so,Why use teacher model in Consistency training.In my understanding,Consistency training is training model isolate.
is anything wrong?
here is Consistency Training (CT) Algorithm in paper
Nice Job! I wonder how I can run the code on a single linux server with multiple GPUs. I can run the code on the server with one GPU by not using mpiexec. But what if I want to use multiple GPUs as nn.DataParallel?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.