openai / consistency_models Goto Github PK

View Code? Open in Web Editor NEW

6.0K 6.0K 405.0 59 KB

Official repo for consistency models.

License: MIT License

Python 93.25% Shell 6.22% Dockerfile 0.40% Makefile 0.12%

consistency_models's People

Contributors

Stargazers

Watchers

Forkers

techthiyanes hertera1 tmgthb gmh5225 pierizvi llegomark kastnerkyle yas francyjglisboa moerehman aidriveone hongruhu scarbain kevinmunson mu2u keyboardcartel oluagunloye jobthompson felixbade jmhxxi dumpmemory mitmul stjordanis ishine jwkweon stealthinu roncrivera kemolo goneout kurtseifried iwwork xiedake leonxia songkui123 lixiao888 kriszhanght jayagami apollohuang1 dizzy12138 muximuxi ii0 whutwzh garyching jxzhangjhu summerchu24 thetargo chengfanbrain suke0 gokunwu xinqiangyu clemente0731 raychan9408 maxmax2016 wangjiaqiys mikeswf machinelearningsystem cjopengler daylight0998 shijreece projecttopstep vivek9chavan binzhu-ece yangboz jogswu szrq pierrehao startime-h gumplus huangzhimin4read zjzno1 russ168 hyybuaa zhchxi12 u-j-007 jinyuansun yangl326-dylan hhy5277 jinwei1 dyhpoon ivanhowaaa haorand qqq-tech 0x00a0 fangwudi duke24k kvgandikota qhsakura autogyro susyimes siyuwang15 vcip2015 lewalker-ren cvjie babyblue26 jonathanfly seafrog1984 alvin8313 protagonistoov sfqry 0x738a

consistency_models's Issues

Sampling with Diffusers

When I use diffusers to sample images with offered code, I can't get an image that corresponds to the label at all.

Multistep sampling in Algorithm 1

Hi, thanks for the great work.

One question here, it seems that the function here below is different from Algorithm 1 in the paper. Any further clarifications on multistep sampling? Thanks.

https://github.com/openai/consistency_models/blob/main/cm/karras_diffusion.py#L657

Is the sampling reversible? (Image to Noise)

Hello! Your work is truly remarkable. I've had the opportunity to experiment with standard DDPM models using DDIM sampling, and despite being relatively slow, they exhibit a fascinating reversibility property. This allows denoising, as well as the reverse process, recreating the exact same input noise (or close).

I am seeing that there is two samplers being used here: sample_onestep and stochastic_iterative_sampler.

stochastic_iterative_sampler adds stochastic noise between each steps so is obviously not reversible.
Not sure about sample_onestep.

I would be grateful if you could provide some insights into whether this reversibility property is preserved in Consistency models sampling. The examples provided do not explicitly demonstrate this aspect, and I am eager to learn more about it. Thank you for your time and expertise.

Is there a non-stochastic sampling method?

I want to directly use CM to get the whole trajectory of xT -> x0, like an ODE sovler, is it possible?

Suffering problem when installing dependence using pip install -e . on google colab

its a problem with flash-attn and got this error

Installing build dependencies ... done
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
Getting requirements to build wheel ... error
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

i tried reinstalling wheel and nothing changed.

implement a non-Cuda flash attention module

The current flash attention module by Hazy Research is Cuda-only, so this limits this repo to Cuda-only too. I suggest writing a separate flash attention module for computers without Nvidia video card. This current module can still be used if Nvidia card is present.

A couple of people have raised this issue with Hazy Research, but they said they're focused on Cuda only, and are not interested in writing a non-Cuda version.

I wonder how to train my own datasets based on your models

Evaluation: Reference Batches for CIFAR10

Would it be possible to release the reference batch for CIFAR10? The reference batches for ImageNet/LSUN are available here, but I could not find a corresponding CIFAR10 batch.

Thanks!

Why does the training not stop？

Reduce total_training_steps.Then,Why does the training not stop when the steps are reduced

Inconsistent loss term with paper

Hi,

Thanks for open-sourcing this wonderful project!

However, I notice in CT training, the loss term has the target model denoising for $x_{t_{n+1}}$ instead of $x_{t_n}$, which is different from the loss (target model denoising for $x_{t_n}$) stated in Alg.3 CT in the paper. Did I miss something, or this mutation does not matter?

When and where the repository for CIFAR-10 experiments release?

An module error when running cm_train.py

OS: Ubuntu20.04 x86_64
Python: 3.11.5
flash-attn: 2.3.6/2.1.2.post3

I use pip to install flash-attn module and the version is 2.3.6.When I run python cm_train to test, I find python throw this error.I change my flash-attn version to 2.1.2.post3,but the error is still exist.
The following is my output:
raceback (most recent call last): File "/root/autodl-tmp/conmodel/scripts/cm_train.py", line 178, in <module> main() File "/root/autodl-tmp/conmodel/scripts/cm_train.py", line 54, in main model, diffusion = create_model_and_diffusion(**model_and_diffusion_kwargs) # 创建model和diffusion ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/autodl-tmp/conmodel/scripts/../cm/script_util.py", line 76, in create_model_and_diffusion model = create_model( ^^^^^^^^^^^^^ File "/root/autodl-tmp/conmodel/scripts/../cm/script_util.py", line 140, in create_model return UNetModel( ^^^^^^^^^^ File "/root/autodl-tmp/conmodel/scripts/../cm/unet.py", line 612, in __init__ AttentionBlock( File "/root/autodl-tmp/conmodel/scripts/../cm/unet.py", line 293, in __init__ self.attention = QKVFlashAttention(channels, self.num_heads) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/autodl-tmp/conmodel/scripts/../cm/unet.py", line 344, in __init__ from flash_attn.flash_attention import FlashAttention ModuleNotFoundError: No module named 'flash_attn.flash_attention'

I guess 'flash_attention' may change to elsewhere during update.

How to implement on stable diffusion?

A really exciting work!
I wonder if it could be implemented in stable diffusion.

Code for Zero-Shot Image Editing

Is it planned to release the code for the Zero-Shot Image Editing?
In particular, I'd be interested in the super-resolution.

Best regards

Not able to obtain results like the checkpoint models (CT, imagenet64) when trying to train from scratch

I tried to train a CT-imagenet model from scratch, and have not been able to reproduce the same quality of images as the checkpoint models provided.
Below is the command I ran:

python cm_train.py --training_mode consistency_training --target_ema_mode adaptive --start_ema 0.95 --scale_mode progressive --start_scales 2 --end_scales 200 --total_training_steps 800000 --loss_norm lpips --lr_anneal_steps 0 --attention_resolutions 32,16,8 --class_cond True --use_scale_shift_norm True --dropout 0.0 --teacher_dropout 0.1 --ema_rate 0.999 --global_batch_size 16 --image_size 64 --lr 0.0001 --num_channels 192 --num_head_channels 64 --num_res_blocks 3 --resblock_updown True --schedule_sampler uniform --use_fp16 True --weight_decay 0.0 --weight_schedule uniform --data_dir /home/ILSVRC2012_img_train

My 3090 can only handle a batch size of 16. Has anyone tried training their own model from scratch?

I use the pre-trained model to generate pictures, the effect is too bad, I don't know what's going on.

This command is python image_sample.py --batch_size 8 --training_mode consistency_distillation --sampler multistep --ts 0,67,150 --steps 151 --model_path E:\Googledownload\ct_bedroom256.pt --attention_resolutions 32,16,8 --class_cond False --use_scale_shift_norm False --dropout 0.0 --image_size 256 --num_channels 256 --num_head_channels 64 --num_res_blocks 2 --num_samples 500 --resblock_updown True --use_fp16 True --weight_schedule uniform。
The resulting effect is shown in Fig.

I don't know why this happens, is there any good way, thanks.

“multi-gpu error” dist.all_gather(gathered_samples, sample) # gather not supported with NCCL

mpiexec -n 8 python scripts/image_sample.py --batch_size 32 --training_mode consistency_distillation --sampler multistep --ts 0,62,150 --steps 151 --model_path ./ct_cat256.pt --attention_resolutions 32,16,8 --class_cond False --use_scale_shift_norm False --dropout 0.0 --image_size 256 --num_channels 256 --num_head_channels 64 --num_res_blocks 2 --num_samples 500 --resblock_updown True --use_fp16 True --weight_schedule uniform

"home/anaconda3/envs/consistency/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 2433, in all_gather
    work = default_pg.allgather([tensor_list], [tensor])
torch.distributed.DistBackendError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1275, internal error, NCCL version 2.14.3
ncclInternalError: Internal check failed.
Last error:
Cuda failure 'peer access is not supported between these two devices'

Different batches size while training

Hi, I'm currently trying to train the LSUN model(256x256). When using a 8xA100(80G) machine, however, I can only set the global the batch size of 64 instead of 256 mentioned in the paper using the default model architecture. I'm wondering if I'm missing something here. Thank you in advance for the help!

QKVFlashAttention unexpected parameters error, running in Google Colab

I tried to generate samples in Colab and everything works except that I had to change this line of code in /cm/unet.py, clearing out factory_kwargs.

Not sure if this is a bug or I did something wrong. This is how I ran it: https://github.com/JonathanFly/consistency_models_colab_notebook/blob/main/Consistency_Models_Make_Samples.ipynb


class QKVFlashAttention(nn.Module):
    def __init__(
        self,
        embed_dim,
        num_heads,
        batch_first=True,
        attention_dropout=0.0,
        causal=False,
        device=None,
        dtype=None,
        **kwargs,
    ) -> None:
        from einops import rearrange
        from flash_attn.flash_attention import FlashAttention

        assert batch_first
        #factory_kwargs = {"device": device, "dtype": dtype}
        factory_kwargs = {}
        super().__init__()
        self.embed_dim = embed_dim
        self.num_heads = num_heads
        self.causal = causal

[inpainting] wrong shape?

Error: RuntimeError: shape '[-1, 7, 3, 256, 256]' is invalid for input of size 6291456

consistency_models/cm/karras_diffusion.py

Line 806 in e32b69e

mask = mask.reshape(-1, 7, 3, image_size, image_size)

 mask = th.zeros(*x.shape, device=dist_util.dev())
 mask = mask.reshape(-1, 7, 3, image_size, image_size) <-- why to reshape to 7???!!!!

x.shape is [batch_size, 3, 256, 256] in my code.

Is that a bug?
In Algorithm 4 in the paper, they described A as an invertible linear transformation, that maps images to the latent space.
I cannot identify any transformation to the latent space.

Tasks

Beta Give feedback

GPU Training Time for CD and CT Models on ImageNet64X64?

Thank you for your wonderful work! Could you please share the number of GPU hours required to train the CD and CT models on ImageNet64X64?

Use one gpu to generate images using a pretrained model without the communication protocol nccl.

I only have one gpu, and I want to successfully run the pre-trained model and generate images, what should I do. Where should the code be changed? Please explain in detail, because I am a newcomera in this regard. Thanks.

RuntimeError in the sample `diffusers` code in README.md

We get "RuntimeError: Expected tensor for argument # 1 'indices' to have scalar type Long; but got torch.IntTensor instead (while checking arguments for embedding)"
when we run

import torch
from diffusers import ConsistencyModelPipeline

device = "cuda"
# Load the cd_imagenet64_l2 checkpoint.
model_id_or_path = "openai/diffusers-cd_imagenet64_l2"
pipe = ConsistencyModelPipeline.from_pretrained(model_id_or_path, torch_dtype=torch.float16)
pipe.to(device)
# Onestep sampling, class-conditional image generation
# ImageNet-64 class label 145 corresponds to king penguins
image = pipe(num_inference_steps=1, class_labels=145).images[0]
image.save("cd_imagenet64_l2_onestep_sample_penguin.png")

# Multistep sampling, class-conditional image generation
# Timesteps can be explicitly specified; the particular timesteps below are from the original Github repo:
# https://github.com/openai/consistency_models/blob/main/scripts/launch.sh#L77
image = pipe(num_inference_steps=None, timesteps=[22, 0], class_labels=145).images[0]
image.save("cd_imagenet64_l2_multistep_sample_penguin.png")

Corrected in Pull requests #43

where can we get the ref_batches/imagenet64.npz file for FID evaluation

Hi, I wonder where can we get the ref_batches/imagenet64.npz file for FID evaluation

question about running scripts: time out initializing process group

I had downloaded the data set of LSUN bedroom and run the Consistency Training on LSUN-256 command in cm/scripts/launch.sh in terminal. No additional message is showed in the window. The screenshot is attached below:

Is it normal? Or maybe I need to wait for a long period of time until the training process is finished, and then some messages will show up.

cannot import name '_get_cpp_backtrace' from 'torch._C'请问这种问题该怎样解决呢

ImportError: cannot import name '_get_cpp_backtrace' from 'torch._C' (/opt/conda/lib/python3.8/site-packages/torch/_C.cpython-38-x86_64-linux-gnu.so)

torch version

Hi, Thanks for your interesting work first!
I am confused about the environment when running the code, and want to know the versions of python and torch used in your project.

Thanks very much！

Specifying a specific image as supervisory information for the generated image？

Is it not possible to specify a corresponding image as supervisory information for the generated image during the generation process in the model code that was released? Please reply to me with relevant information. Thank you!

Issue with `use_fp16=True` Leading to Type Conversion Error in `unet.py`

When setting use_fp16=False, the code functions correctly. However, an issue arises with use_fp16=True due to an unexpected type conversion in unet.py(line435).

The problem occurs at line 435, where the tensor a is converted from float16 to float32:

a = a.float()

Prior to this line, a is in float16, but after this line, it is converted to float32. If we remove or comment out this line, the code encounters an error. It seems that maintaining a in float16 is essential for the use_fp16=True setting to work correctly, but the current implementation inadvertently converts it to float32, leading to issues.

Additionally, I've noticed that the current code has been modified to prevent the utilization of flash attention. I also attempted to run the original version, but encountered similar errors.

The difference between Loss( f(x0+t_{n+1}z) , f(x0+t_{n}z) ) and Loss( f(x0+t_{n+1}*z), x0 )?

In CT, would it be acceptable to use Loss( f(x0+t_{n+1}*z), x0 ) in place of Loss( f(x0+t_{n+1}*z) , f(x0+t_{n}*z) ) ?
I would like to know if doing so would still comply with the principles of consistency models, or would it completely deviate or only result in a slight decrease in performance?

why lg_loss_scale？

Is it possible to provide any explanation about the lg_loss_scale?

Super-resolution conditional training

Hi,

How can I use the provided repository for image-conditional training and evaluation (super-resolution)?
And which approach is recommended for such a task? consistency-distillation? or consistency-training?

(Assuming the base model for the consistency-distillation originates from: "Image Super-Resolution via Iterative Refinement" )

Thanks!

How is Consistency for Any Point on the ODE Path Achieved in the Consistency Model?

I've been exploring the Consistency model and am intrigued by its approach to maintaining consistency across the ODE path. The objective function appears to ensure that two specific points along the ODE path are consistent after being processed by f(·), and it also constrains the outcome of the boundary ε samples through f(·). However, I'm curious about how the model ensures that the result of processing any point along the ODE path by f(·) matches the ε samples.

Could anyone provide insights or explain the underlying mechanism that guarantees this level of consistency across the entire ODE path? Any further clarification or pointers to additional resources would be greatly appreciated.

RuntimeError: FlashAttention is only supported on CUDA 11 and above

Does anyone have the same problem?
Can you help me out?
I will be very grateful！

Differences between code and paper

Hi,

I've emailed Yang Song about differences between paper and code, but I thought I'd raise it as an issue so others can see. I'll update this issue if I get a reply via email.

There are some differences between the paper and the code, and I was hoping to know which is the better approach.

[Solved?] The Karras Rho scheduling is a bit different in the paper, where the code is the same as the EDM implementation. This one I think is explainable since they are just the reverse of each other - this may be why tn and tn+1 are switched in the code.
The input for the denoiser is scaled in the code, but not in the paper i.e. c_in.
Time rescaling is multiplied by a factor of 1000. My first thoughts on this is that it may be because the Temporal Embedding in the model prefers larger floats, e.g. Sinusoidal PE - but this is unconfirmed.

Any advice would be really appreciated, thank you.

CUDA out of memory

Hi,

Anyone knows the GPU requirement to train this network (Consistency Distillation)? I'm always getting CUDA out of memory errors:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.00 GiB (GPU 0; 39.42 GiB total capacity; 34.41 GiB already allocated; 2.96 GiB free; 34.90 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I've tried to decrease the batch size, the memory requirement in "Tried to allocate 4.00 GiB" do decrease, but such error still occurs as the "already allocated" data usage increases.

Is there any consistency model on face dataset?

It would be great to see how this consistency model performs on face datasets such as CelebA-HQ and FFHQ, since the paper didn't mention the face dataset.

pip install -e . error

When I execute pip install -e . as speficied in README.md but I encount below error.

Collecting flash-attn (from consistency-models==0.0.0)
  Downloading flash_attn-1.0.4.tar.gz (2.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 8.6 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error
  
  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [17 lines of output]
      Traceback (most recent call last):
        File "/workspace/python/consistency_models/venv/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/workspace/python/consistency_models/venv/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/workspace/python/consistency_models/venv/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
          return hook(config_settings)
        File "/tmp/pip-build-env-9dhpu9sr/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 341, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=['wheel'])
        File "/tmp/pip-build-env-9dhpu9sr/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 323, in _get_build_requires
          self.run_setup()
        File "/tmp/pip-build-env-9dhpu9sr/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 487, in run_setup
          super(_BuildMetaLegacyBackend,
        File "/tmp/pip-build-env-9dhpu9sr/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 338, in run_setup
          exec(code, locals())
        File "<string>", line 6, in <module>
      ModuleNotFoundError: No module named 'packaging'
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

I try to install packaging manually, but the situation is not solved.
I am using venv python virtual enviroment.

The parameter sigma_data for KarrasDenoiser.

Hi,
thank you for your great work on this project!
I have been comparing the noise schedule implementation between this project and EDM, and I noticed some odd differences.
In particular, I was looking at this code snippet:

consistency_models/cm/script_util.py

Lines 94 to 101 in edfe91e

 diffusion = KarrasDenoiser( 

 sigma_data=0.5, 

 sigma_max=sigma_max, 

 sigma_min=sigma_min, 

 distillation=distillation, 

 weight_schedule=weight_schedule, 

 ) 

 return model, diffusion

The image tensors in both projects are normalized in the range [-1, 1]. Based on my understanding, the parameter "sigma_data" should be set to 1.0, similar to the EDM settings. Could you kindly clarify the reason behind the difference in the "sigma_data" value between the two projects?

Training settings and pretrained models for CIFAR-10

As the title

Can I use the checkpoint provided by the original EDM paper to train the consistency model?

As the title, can I use the .pkl provided in https://github.com/NVlabs/edm as the teacher model to train consistency model？

Description The sampling command blocked

The sampling command gets stuck in the command line.

The python xxxx.py file can be executed normally.

Do you know the situation of this, please let me know, thank you!

ModuleNotFoundError: No module named 'cm'

How to solve this problem when importing classes from cm package? Does it mean that the project have to be built first?

Request for unconditional model on ImageNet 64*64

Thanks a lot for your outstanding work in Consistency Model!
We are attempting to utilize the one-step generation process of consistency model to achieve some downstream tasks. However, it seems that only conditional generation model is provided on ImageNet $64 \times 64$.
Could you please further provide the unconditional model on ImageNet $64 \times 64$, which will be very helpful to us.

An error was reported while running the project：

Why use teacher model in Consistency training

writen In launch.sh section "Consistency training on class-conditional ImageNet-64, and LSUN 256"

mpiexec -n 8 python cm_train.py --training_mode consistency_training ... --teacher_model_path /path/to/edm_bedroom256_ema.pt ...

so,Why use teacher model in Consistency training.In my understanding,Consistency training is training model isolate.
is anything wrong?
here is Consistency Training (CT) Algorithm in paper

	diffusion = KarrasDenoiser(
	sigma_data=0.5,
	sigma_max=sigma_max,
	sigma_min=sigma_min,
	distillation=distillation,
	weight_schedule=weight_schedule,
	)
	return model, diffusion

openai / consistency_models Goto Github PK

consistency_models's People

Contributors

Stargazers

Watchers

Forkers

consistency_models's Issues

Tasks

Recommend Projects

Recommend Topics

Recommend Org

Jobs