GithubHelp home page GithubHelp logo

omriav / blended-diffusion Goto Github PK

View Code? Open in Web Editor NEW
521.0 12.0 39.0 43.42 MB

Official implementation for "Blended Diffusion for Text-driven Editing of Natural Images" [CVPR 2022]

Home Page: https://omriavrahami.com/blended-diffusion-page/

License: MIT License

Python 5.75% Jupyter Notebook 94.25%
text-guided-manipulation multimodal diffusion openai-clip openai text-to-image deep-learning blended-diffusion

blended-diffusion's Introduction

Blended Diffusion for Text-driven Editing of Natural Images [CVPR 2022]

Blended Diffusion for Text-driven Editing of Natural Images

Omri Avrahami, Dani Lischinski, Ohad Fried

Abstract: Natural language offers a highly intuitive interface for image editing. In this paper, we introduce the first solution for performing local (region-based) edits in generic natural images, based on a natural language description along with an ROI mask. We achieve our goal by leveraging and combining a pretrained language-image model (CLIP), to steer the edit towards a user-provided text prompt, with a denoising diffusion probabilistic model (DDPM) to generate natural-looking results. To seamlessly fuse the edited region with the unchanged parts of the image, we spatially blend noised versions of the input image with the local text-guided diffusion latent at a progression of noise levels. In addition, we show that adding augmentations to the diffusion process mitigates adversarial results. We compare against several baselines and related methods, both qualitatively and quantitatively, and show that our method outperforms these solutions in terms of overall realism, ability to preserve the background and matching the text. Finally, we show several text-driven editing applications, including adding a new object to an image, removing/replacing/altering existing objects, background replacement, and image extrapolation.

News

You may be interested in the follow-up project Blended Latent Diffusion, which produces better results and with a significant speed-up. Code is available here.

Getting Started

Installation

  1. Create the virtual environment:
$ conda create --name blended-diffusion python=3.9
$ conda activate blended-diffusion
$ pip3 install ftfy regex matplotlib lpips kornia opencv-python torch==1.9.0+cu111 torchvision==0.10.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html
  1. Create a checkpoints directory and download the pretrained diffusion model from here to this folder.

Image generation

An example of text-driven multiple synthesis results:

$ python main.py -p "rock" -i "input_example/img.png" --mask "input_example/mask.png" --output_path "output"

The generation results will be saved in output/ranked folder, ordered by CLIP similarity rank. In order to get the best results, please generate a large number of results (at least 64) and take the best ones.

In order to generate multiple results in a single diffusion process, we utilized batch processing. If you get CUDA out of memory try first to lower the batch size by setting --batch_size 1.

Applications

Multiple synthesis results for the same prompt

Synthesis results for different prompts

Altering part of an existing object

Background replacement

Scribble-guided editing

Text-guided extrapolation

Composing several applications

Acknowledgments

This code borrows from CLIP, Guided-diffusion and CLIP-Guided Diffusion.

Citation

If you use this code for your research, please cite the following:

@InProceedings{Avrahami_2022_CVPR,
    author    = {Avrahami, Omri and Lischinski, Dani and Fried, Ohad},
    title     = {Blended Diffusion for Text-Driven Editing of Natural Images},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2022},
    pages     = {18208-18218}
}

blended-diffusion's People

Contributors

omriav avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

blended-diffusion's Issues

RuntimeError: cusolver error: CUSOLVER_STATUS_INTERNAL_ERROR, when calling `cusolverDnCreate(handle)`

Thanks for your interesting work.
I am working on the environment setup. When I installed the packages followed your instruction,
and run the code "python main.py -p "rock" -i "input_example/img.png" --mask "input_example/mask.png" --output_path "output"

There is always one error interrupted:
RuntimeError: cusolver error: CUSOLVER_STATUS_INTERNAL_ERROR, when calling cusolverDnCreate(handle)

in 'image_editor.py' line 113:
augmented_input = self.image_augmentations(masked_input).add(1).div(2)

This is the conda list info:

Name Version Build Channel

_libgcc_mutex 0.1 conda_forge https://mirrors.ustc.edu.cn/anaconda/cloud/conda-forge
_openmp_mutex 4.5 2_gnu https://mirrors.ustc.edu.cn/anaconda/cloud/conda-forge
bzip2 1.0.8 h7f98852_4 https://mirrors.ustc.edu.cn/anaconda/cloud/conda-forge
ca-certificates 2023.7.22 hbcca054_0 https://mirrors.ustc.edu.cn/anaconda/cloud/conda-forge
contourpy 1.1.1 pypi_0 pypi
cycler 0.12.1 pypi_0 pypi
fonttools 4.43.1 pypi_0 pypi
ftfy 6.1.1 pypi_0 pypi
importlib-resources 6.1.0 pypi_0 pypi
kiwisolver 1.4.5 pypi_0 pypi
kornia 0.6.8 pypi_0 pypi
ld_impl_linux-64 2.40 h41732ed_0 https://mirrors.ustc.edu.cn/anaconda/cloud/conda-forge
libffi 3.4.2 h7f98852_5 https://mirrors.ustc.edu.cn/anaconda/cloud/conda-forge
libgcc-ng 13.2.0 h807b86a_2 https://mirrors.ustc.edu.cn/anaconda/cloud/conda-forge
libgomp 13.2.0 h807b86a_2 https://mirrors.ustc.edu.cn/anaconda/cloud/conda-forge
libnsl 2.0.1 hd590300_0 https://mirrors.ustc.edu.cn/anaconda/cloud/conda-forge
libsqlite 3.43.2 h2797004_0 https://mirrors.ustc.edu.cn/anaconda/cloud/conda-forge
libuuid 2.38.1 h0b41bf4_0 https://mirrors.ustc.edu.cn/anaconda/cloud/conda-forge
libzlib 1.2.13 hd590300_5 https://mirrors.ustc.edu.cn/anaconda/cloud/conda-forge
lpips 0.1.4 pypi_0 pypi
matplotlib 3.8.0 pypi_0 pypi
ncurses 6.4 hcb278e6_0 https://mirrors.ustc.edu.cn/anaconda/cloud/conda-forge
numpy 1.26.1 pypi_0 pypi
opencv-python 4.8.1.78 pypi_0 pypi
openssl 3.1.3 hd590300_0 https://mirrors.ustc.edu.cn/anaconda/cloud/conda-forge
packaging 23.2 pypi_0 pypi
pillow 10.1.0 pypi_0 pypi
pip 23.3 pyhd8ed1ab_0 https://mirrors.ustc.edu.cn/anaconda/cloud/conda-forge
pyparsing 3.1.1 pypi_0 pypi
python 3.9.18 h0755675_0_cpython https://mirrors.ustc.edu.cn/anaconda/cloud/conda-forge
python-dateutil 2.8.2 pypi_0 pypi
readline 8.2 h8228510_1 https://mirrors.ustc.edu.cn/anaconda/cloud/conda-forge
regex 2023.10.3 pypi_0 pypi
scipy 1.11.3 pypi_0 pypi
setuptools 68.2.2 pyhd8ed1ab_0 https://mirrors.ustc.edu.cn/anaconda/cloud/conda-forge
six 1.16.0 pypi_0 pypi
tk 8.6.13 h2797004_0 https://mirrors.ustc.edu.cn/anaconda/cloud/conda-forge
torch 1.9.0+cu111 pypi_0 pypi
torchvision 0.10.0+cu111 pypi_0 pypi
tqdm 4.66.1 pypi_0 pypi
typing-extensions 4.8.0 pypi_0 pypi
tzdata 2023c h71feb2d_0 https://mirrors.ustc.edu.cn/anaconda/cloud/conda-forge
wcwidth 0.2.8 pypi_0 pypi
wheel 0.41.2 pyhd8ed1ab_0 https://mirrors.ustc.edu.cn/anaconda/cloud/conda-forge
xz 5.2.6 h166bdaf_0 https://mirrors.ustc.edu.cn/anaconda/cloud/conda-forge
zipp 3.17.0 pypi_0 pypi

and the cuda version info:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Tue_Sep_15_19:10:02_PDT_2020
Cuda compilation tools, release 11.1, V11.1.74
Build cuda_11.1.TC455_06.29069683_0

Can you help me to find out what's the problem is? THX a lot.

a bug

thank you for your interesting work, but I have a question when runing
python main.py -p "rock" -i "input_example/img.png" --mask "input_example/mask.png" --output_path "output"
the bug is like that
图片
but my environment is just like you say
图片

Failure to generate correctly for very small masks

Hi, thank you for your interesting work!

I wanted to generate a very small object in the image, so I input a very small mask(4-5pixels wide), but it didn't generate an object for the prompt. So I have 2 question.

  1. Is there something I need to modify (e.g., parameters, pre-processing, etc..) to generate properly even with a very small mask?
  2. I would be interested in your thoughts on why it fails to generate from a very small mask.

Thanks!

Colab

Hello! So hyped for this model. When can an Colab Version be expected? Much love

Purpose of skip_timesteps

Hi, I would like to ask what is the purpose of having skip_timesteps in the code? I can't seems to find any related information on this from the paper.

Questions about generating multiple images at once

Hi, thank you for your interesting work!

In a classification or detection task, batch size can be used to infer multiple images at once. However, in blended diffusion, the batch size is an argument for how many images to generate for one image. Is there an argument for generating multiple images at once, and if not, how should it be changed?

Thanks!

128*128

hi can I use this model on size 128*128 image

How to train the Unet

What an amazing job! This work can help me a lot in my study.

I did not find the training code for the Unet. If I want to train the Unet for my data, how should I do? You mention that this work is based on "guided-diffusion". Is the Unet is trained by the code of guided-diffusion?

weights optimization

Hi, when optimizing the UNet, why "qkv", "norm" and "proj" are chosen to be optimized? Any reason behind this? why not the whole model or other elements of the model? Thanks.

AttributeError: 'PosixPath' object has no attribute 'with_stem'

Thanks for opening your code. I really appreciated that.

I tried to run your code in the Google Colab with GPU Runtime.

I got an error. And I couldn't find any solution despite of googling...

The error message is as follows :
AttributeError: 'PosixPath' object has no attribute 'with_stem'

I think it's kinda related to "pathlib" module. maybe it's due to the fact that pathlib doesn't work well with latest python 3.x version which I'm using.

I hope this error will be solved soon QQ

-----detail description-------

I run the terminal argument like this as follows :
python main.py -p "rock" -i "input_example/img.png" --mask "input_example/mask.png" --output_path "output" --batch_size 1

And, I got this whole bunch of error messages as follow :
Using device: cuda:0
tcmalloc: large alloc 2209964032 bytes == 0x89ebe000 @ 0x7fbdefa2cb6b 0x7fbdefa4c379 0x7fbd30b1026e 0x7fbd30b119e2 0x7fbd334aeee1 0x7fbdd598e236 0x7fbdd541ef98 0x593784 0x594731 0x548cc1 0x51566f 0x549e0e 0x593fce 0x5118f8 0x549e0e 0x4bcb19 0x59582d 0x595b69 0x62026d 0x55de15 0x59af67 0x515655 0x549e0e 0x4bca8a 0x5134a6 0x549576 0x593fce 0x548ae9 0x5127f1 0x4bc98a 0x532b86
Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off]
Loading model from: /usr/local/lib/python3.7/dist-packages/lpips/weights/v0.1/vgg.pth
Start iterations 0
0% 0/75 [00:00<?, ?it/s]clip_loss - 867.99
range_loss - 0.00

Traceback (most recent call last):
File "main.py", line 8, in <module>
image_editor.edit_image_by_prompt()

File "/content/drive/MyDrive/ws/blended-diffusion/optimization/image_editor.py", line 266, in edit_image_by_prompt
visualization_path = visualization_path.with_stem(

AttributeError: 'PosixPath' object has no attribute 'with_stem'
0% 0/75 [00:01<?, ?it/s]

The codes below are the ones that error comes in. Those are from blended-diffusion/optimization/image_editor.py python file.


line 1 - from pathlib import Path
...
line 261 - for b in range(self.args.batch_size):
line 262 - pred_image = sample["pred_xstart"][b]
line 263 - visualization_path = Path(
line 264 - os.path.join(self.args.output_path, self.args.output_file)
line 265 - )
line 266 - visualization_path = visualization_path.with_stem(

Question about training

Hi, this is really an impressive work! Two question here.

  1. I would like ask is the overall process of the text-guided image editing is using only pre-trained model without any extra training or fine-tuning?
  2. If it does not required any further fine-tuning or training, what is the purpose of having diffusion guided loss (which combine loss from CLIP model and background preservation loss)?

Thanks in advance for your clarification!

model_output_size

in image_editor.py:

self.model.load_state_dict(
torch.load(
"checkpoints/256x256_diffusion_uncond.pt"
if self.args.model_output_size == 256
else "checkpoints/512x512_diffusion.pt",
map_location="cpu",
)
)

mentioned '512x512_diffusion', is it an conditional or unconditional model? (can you share me its download link?)
It is natural that your method is built on a pretrained uncontitional model. If '512x512_diffusion' is a conditioonal model, what is the condition for face data for example? can you help me figure out this?

how clip model works

Hi!

Thanks for your great work first.

I'm wondering how the clip model works in your code. As I learned from your code, especially image_editor.py, the input of clip model is -prompt and the output is the corresponding image. And clip is only used to compute distance(loss). However, it is supposed that clip model should take -prompt as input and then generate the corresponding image which is to be placed in the white part of the mask for final image generation. I didn't see any information about it. Maybe I missed that part. Could you please explain it?

Thanks!

Scribble-guided editing

Hi! I wonder if a loss such as MSE or LPIPS is used between the user-provided scribbles and the scribbled regions of $\widehat{x}_0$ , in addition to the CLIP loss. I am curious how the shapes and colors stay consistent when only text with no specific description, e.g., "blanket" in Fig 9, is given.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.