GithubHelp home page GithubHelp logo

consistency_models's People

Contributors

ashutosh1919 avatar ayushtues avatar brandonjy avatar discus0434 avatar eltociear avatar sayakpaul avatar take2rohit avatar tmgthb avatar yang-song avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

consistency_models's Issues

Sampling with Diffusers

When I use diffusers to sample images with offered code, I can't get an image that corresponds to the label at all.

Is the sampling reversible? (Image to Noise)

Hello! Your work is truly remarkable. I've had the opportunity to experiment with standard DDPM models using DDIM sampling, and despite being relatively slow, they exhibit a fascinating reversibility property. This allows denoising, as well as the reverse process, recreating the exact same input noise (or close).

I am seeing that there is two samplers being used here: sample_onestep and stochastic_iterative_sampler.

stochastic_iterative_sampler adds stochastic noise between each steps so is obviously not reversible.
Not sure about sample_onestep.

I would be grateful if you could provide some insights into whether this reversibility property is preserved in Consistency models sampling. The examples provided do not explicitly demonstrate this aspect, and I am eager to learn more about it. Thank you for your time and expertise.

Suffering problem when installing dependence using pip install -e . on google colab

its a problem with flash-attn and got this error

Installing build dependencies ... done
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
Getting requirements to build wheel ... error
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

i tried reinstalling wheel and nothing changed.

implement a non-Cuda flash attention module

The current flash attention module by Hazy Research is Cuda-only, so this limits this repo to Cuda-only too. I suggest writing a separate flash attention module for computers without Nvidia video card. This current module can still be used if Nvidia card is present.

A couple of people have raised this issue with Hazy Research, but they said they're focused on Cuda only, and are not interested in writing a non-Cuda version.

Evaluation: Reference Batches for CIFAR10

Would it be possible to release the reference batch for CIFAR10? The reference batches for ImageNet/LSUN are available here, but I could not find a corresponding CIFAR10 batch.

Thanks!

Inconsistent loss term with paper

Hi,

Thanks for open-sourcing this wonderful project!

However, I notice in CT training, the loss term has the target model denoising for $x_{t_{n+1}}$ instead of $x_{t_n}$, which is different from the loss (target model denoising for $x_{t_n}$) stated in Alg.3 CT in the paper. Did I miss something, or this mutation does not matter?

An module error when running cm_train.py

OS: Ubuntu20.04 x86_64
Python: 3.11.5
flash-attn: 2.3.6/2.1.2.post3

I use pip to install flash-attn module and the version is 2.3.6.When I run python cm_train to test, I find python throw this error.I change my flash-attn version to 2.1.2.post3,but the error is still exist.
The following is my output:
raceback (most recent call last): File "/root/autodl-tmp/conmodel/scripts/cm_train.py", line 178, in <module> main() File "/root/autodl-tmp/conmodel/scripts/cm_train.py", line 54, in main model, diffusion = create_model_and_diffusion(**model_and_diffusion_kwargs) # 创建model和diffusion ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/autodl-tmp/conmodel/scripts/../cm/script_util.py", line 76, in create_model_and_diffusion model = create_model( ^^^^^^^^^^^^^ File "/root/autodl-tmp/conmodel/scripts/../cm/script_util.py", line 140, in create_model return UNetModel( ^^^^^^^^^^ File "/root/autodl-tmp/conmodel/scripts/../cm/unet.py", line 612, in __init__ AttentionBlock( File "/root/autodl-tmp/conmodel/scripts/../cm/unet.py", line 293, in __init__ self.attention = QKVFlashAttention(channels, self.num_heads) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/autodl-tmp/conmodel/scripts/../cm/unet.py", line 344, in __init__ from flash_attn.flash_attention import FlashAttention ModuleNotFoundError: No module named 'flash_attn.flash_attention'

I guess 'flash_attention' may change to elsewhere during update.

Code for Zero-Shot Image Editing

Is it planned to release the code for the Zero-Shot Image Editing?
In particular, I'd be interested in the super-resolution.

Best regards

Not able to obtain results like the checkpoint models (CT, imagenet64) when trying to train from scratch

I tried to train a CT-imagenet model from scratch, and have not been able to reproduce the same quality of images as the checkpoint models provided.
Below is the command I ran:

python cm_train.py --training_mode consistency_training --target_ema_mode adaptive --start_ema 0.95 --scale_mode progressive --start_scales 2 --end_scales 200 --total_training_steps 800000 --loss_norm lpips --lr_anneal_steps 0 --attention_resolutions 32,16,8 --class_cond True --use_scale_shift_norm True --dropout 0.0 --teacher_dropout 0.1 --ema_rate 0.999 --global_batch_size 16 --image_size 64 --lr 0.0001 --num_channels 192 --num_head_channels 64 --num_res_blocks 3 --resblock_updown True --schedule_sampler uniform --use_fp16 True --weight_decay 0.0 --weight_schedule uniform --data_dir /home/ILSVRC2012_img_train

My 3090 can only handle a batch size of 16. Has anyone tried training their own model from scratch?

I use the pre-trained model to generate pictures, the effect is too bad, I don't know what's going on.

This command is python image_sample.py --batch_size 8 --training_mode consistency_distillation --sampler multistep --ts 0,67,150 --steps 151 --model_path E:\Googledownload\ct_bedroom256.pt --attention_resolutions 32,16,8 --class_cond False --use_scale_shift_norm False --dropout 0.0 --image_size 256 --num_channels 256 --num_head_channels 64 --num_res_blocks 2 --num_samples 500 --resblock_updown True --use_fp16 True --weight_schedule uniform。
The resulting effect is shown in Fig.
image

I don't know why this happens, is there any good way, thanks.

“multi-gpu error” dist.all_gather(gathered_samples, sample) # gather not supported with NCCL

mpiexec -n 8 python scripts/image_sample.py --batch_size 32 --training_mode consistency_distillation --sampler multistep --ts 0,62,150 --steps 151 --model_path ./ct_cat256.pt --attention_resolutions 32,16,8 --class_cond False --use_scale_shift_norm False --dropout 0.0 --image_size 256 --num_channels 256 --num_head_channels 64 --num_res_blocks 2 --num_samples 500 --resblock_updown True --use_fp16 True --weight_schedule uniform

"home/anaconda3/envs/consistency/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 2433, in all_gather
    work = default_pg.allgather([tensor_list], [tensor])
torch.distributed.DistBackendError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1275, internal error, NCCL version 2.14.3
ncclInternalError: Internal check failed.
Last error:
Cuda failure 'peer access is not supported between these two devices'

Different batches size while training

Hi, I'm currently trying to train the LSUN model(256x256). When using a 8xA100(80G) machine, however, I can only set the global the batch size of 64 instead of 256 mentioned in the paper using the default model architecture. I'm wondering if I'm missing something here. Thank you in advance for the help!

QKVFlashAttention unexpected parameters error, running in Google Colab

I tried to generate samples in Colab and everything works except that I had to change this line of code in /cm/unet.py, clearing out factory_kwargs.

Not sure if this is a bug or I did something wrong. This is how I ran it: https://github.com/JonathanFly/consistency_models_colab_notebook/blob/main/Consistency_Models_Make_Samples.ipynb


class QKVFlashAttention(nn.Module):
    def __init__(
        self,
        embed_dim,
        num_heads,
        batch_first=True,
        attention_dropout=0.0,
        causal=False,
        device=None,
        dtype=None,
        **kwargs,
    ) -> None:
        from einops import rearrange
        from flash_attn.flash_attention import FlashAttention

        assert batch_first
        #factory_kwargs = {"device": device, "dtype": dtype}
        factory_kwargs = {}
        super().__init__()
        self.embed_dim = embed_dim
        self.num_heads = num_heads
        self.causal = causal

[inpainting] wrong shape?

Error: RuntimeError: shape '[-1, 7, 3, 256, 256]' is invalid for input of size 6291456

In

mask = mask.reshape(-1, 7, 3, image_size, image_size)

 mask = th.zeros(*x.shape, device=dist_util.dev())
 mask = mask.reshape(-1, 7, 3, image_size, image_size) <-- why to reshape to 7???!!!!

x.shape is [batch_size, 3, 256, 256] in my code.

Is that a bug?
In Algorithm 4 in the paper, they described A as an invertible linear transformation, that maps images to the latent space.
I cannot identify any transformation to the latent space.

Tasks

RuntimeError in the sample `diffusers` code in README.md

We get "RuntimeError: Expected tensor for argument # 1 'indices' to have scalar type Long; but got torch.IntTensor instead (while checking arguments for embedding)"
when we run

import torch
from diffusers import ConsistencyModelPipeline

device = "cuda"
# Load the cd_imagenet64_l2 checkpoint.
model_id_or_path = "openai/diffusers-cd_imagenet64_l2"
pipe = ConsistencyModelPipeline.from_pretrained(model_id_or_path, torch_dtype=torch.float16)
pipe.to(device)
# Onestep sampling, class-conditional image generation
# ImageNet-64 class label 145 corresponds to king penguins
image = pipe(num_inference_steps=1, class_labels=145).images[0]
image.save("cd_imagenet64_l2_onestep_sample_penguin.png")

# Multistep sampling, class-conditional image generation
# Timesteps can be explicitly specified; the particular timesteps below are from the original Github repo:
# https://github.com/openai/consistency_models/blob/main/scripts/launch.sh#L77
image = pipe(num_inference_steps=None, timesteps=[22, 0], class_labels=145).images[0]
image.save("cd_imagenet64_l2_multistep_sample_penguin.png")

Corrected in Pull requests #43

question about running scripts: time out initializing process group

I had downloaded the data set of LSUN bedroom and run the Consistency Training on LSUN-256 command in cm/scripts/launch.sh in terminal. No additional message is showed in the window. The screenshot is attached below:
Screenshot from 2023-04-17 11-55-09
Is it normal? Or maybe I need to wait for a long period of time until the training process is finished, and then some messages will show up.

torch version

Hi, Thanks for your interesting work first!
I am confused about the environment when running the code, and want to know the versions of python and torch used in your project.

Thanks very much!

Issue with `use_fp16=True` Leading to Type Conversion Error in `unet.py`

When setting use_fp16=False, the code functions correctly. However, an issue arises with use_fp16=True due to an unexpected type conversion in unet.py(line435).

The problem occurs at line 435, where the tensor a is converted from float16 to float32:

a = a.float()

Prior to this line, a is in float16, but after this line, it is converted to float32. If we remove or comment out this line, the code encounters an error. It seems that maintaining a in float16 is essential for the use_fp16=True setting to work correctly, but the current implementation inadvertently converts it to float32, leading to issues.

Additionally, I've noticed that the current code has been modified to prevent the utilization of flash attention. I also attempted to run the original version, but encountered similar errors.

How is Consistency for Any Point on the ODE Path Achieved in the Consistency Model?

I've been exploring the Consistency model and am intrigued by its approach to maintaining consistency across the ODE path. The objective function appears to ensure that two specific points along the ODE path are consistent after being processed by f(·), and it also constrains the outcome of the boundary ε samples through f(·). However, I'm curious about how the model ensures that the result of processing any point along the ODE path by f(·) matches the ε samples.

Could anyone provide insights or explain the underlying mechanism that guarantees this level of consistency across the entire ODE path? Any further clarification or pointers to additional resources would be greatly appreciated.

Differences between code and paper

Hi,

I've emailed Yang Song about differences between paper and code, but I thought I'd raise it as an issue so others can see. I'll update this issue if I get a reply via email.

There are some differences between the paper and the code, and I was hoping to know which is the better approach.

  • [Solved?] The Karras Rho scheduling is a bit different in the paper, where the code is the same as the EDM implementation. This one I think is explainable since they are just the reverse of each other - this may be why tn and tn+1 are switched in the code.
  • The input for the denoiser is scaled in the code, but not in the paper i.e. c_in.
  • Time rescaling is multiplied by a factor of 1000. My first thoughts on this is that it may be because the Temporal Embedding in the model prefers larger floats, e.g. Sinusoidal PE - but this is unconfirmed.

Any advice would be really appreciated, thank you.

CUDA out of memory

Hi,

Anyone knows the GPU requirement to train this network (Consistency Distillation)? I'm always getting CUDA out of memory errors:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.00 GiB (GPU 0; 39.42 GiB total capacity; 34.41 GiB already allocated; 2.96 GiB free; 34.90 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I've tried to decrease the batch size, the memory requirement in "Tried to allocate 4.00 GiB" do decrease, but such error still occurs as the "already allocated" data usage increases.

pip install -e . error

When I execute pip install -e . as speficied in README.md but I encount below error.

Collecting flash-attn (from consistency-models==0.0.0)
  Downloading flash_attn-1.0.4.tar.gz (2.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 8.6 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error
  
  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [17 lines of output]
      Traceback (most recent call last):
        File "/workspace/python/consistency_models/venv/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/workspace/python/consistency_models/venv/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/workspace/python/consistency_models/venv/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
          return hook(config_settings)
        File "/tmp/pip-build-env-9dhpu9sr/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 341, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=['wheel'])
        File "/tmp/pip-build-env-9dhpu9sr/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 323, in _get_build_requires
          self.run_setup()
        File "/tmp/pip-build-env-9dhpu9sr/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 487, in run_setup
          super(_BuildMetaLegacyBackend,
        File "/tmp/pip-build-env-9dhpu9sr/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 338, in run_setup
          exec(code, locals())
        File "<string>", line 6, in <module>
      ModuleNotFoundError: No module named 'packaging'
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

I try to install packaging manually, but the situation is not solved.
I am using venv python virtual enviroment.

The parameter sigma_data for KarrasDenoiser.

Hi,
thank you for your great work on this project!
I have been comparing the noise schedule implementation between this project and EDM, and I noticed some odd differences.
In particular, I was looking at this code snippet:

diffusion = KarrasDenoiser(
sigma_data=0.5,
sigma_max=sigma_max,
sigma_min=sigma_min,
distillation=distillation,
weight_schedule=weight_schedule,
)
return model, diffusion

The image tensors in both projects are normalized in the range [-1, 1]. Based on my understanding, the parameter "sigma_data" should be set to 1.0, similar to the EDM settings. Could you kindly clarify the reason behind the difference in the "sigma_data" value between the two projects?

Description The sampling command blocked

image
The sampling command gets stuck in the command line.

The python xxxx.py file can be executed normally.

Do you know the situation of this, please let me know, thank you!

ModuleNotFoundError: No module named 'cm'

ModuleNotFoundError: No module named 'cm'

How to solve this problem when importing classes from cm package? Does it mean that the project have to be built first?

Request for unconditional model on ImageNet 64*64

Thanks a lot for your outstanding work in Consistency Model!
We are attempting to utilize the one-step generation process of consistency model to achieve some downstream tasks. However, it seems that only conditional generation model is provided on ImageNet $64 \times 64$.
Could you please further provide the unconditional model on ImageNet $64 \times 64$, which will be very helpful to us.

Why use teacher model in Consistency training

writen In launch.sh section "Consistency training on class-conditional ImageNet-64, and LSUN 256"

mpiexec -n 8 python cm_train.py --training_mode consistency_training ... --teacher_model_path /path/to/edm_bedroom256_ema.pt ...

so,Why use teacher model in Consistency training.In my understanding,Consistency training is training model isolate.
is anything wrong?
here is Consistency Training (CT) Algorithm in paper
image

How to run on a single linux server with multiple GPUs

Nice Job! I wonder how I can run the code on a single linux server with multiple GPUs. I can run the code on the server with one GPU by not using mpiexec. But what if I want to use multiple GPUs as nn.DataParallel?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.