GithubHelp home page GithubHelp logo

junyi42 / sd-dino Goto Github PK

View Code? Open in Web Editor NEW
238.0 6.0 12.0 35.48 MB

Official Implementation of paper "A Tale of Two Features: Stable Diffusion Complements DINO for Zero-Shot Semantic Correspondence"

Home Page: https://sd-complements-dino.github.io

Shell 0.04% Jupyter Notebook 91.50% Python 8.46%

sd-dino's Introduction

A Tale of Two Features: Stable Diffusion Complements DINO for Zero-Shot Semantic Correspondence

A Tale of Two Features explores the complementary nature of Stable Diffusion (SD) and DINOv2 features for zero-shot semantic correspondence. The results demonstrate that a simple fusion of the two features leads to state-of-the-art performance on the SPair-71k, PF-Pascal, and TSS datasets.

This repository is the official implementation of the paper:

A Tale of Two Features: Stable Diffusion Complements DINO for Zero-Shot Semantic Correspondence Junyi Zhang, Charles Herrmann, Junhwa Hur, Luisa F. Polanía, Varun Jampani, Deqing Sun, Ming-Hsuan Yang NeurIPS, 2023.

[New!] We have released the code for Telling Left from Right, a follow-up with better semantic correspondence.

teaser

Visual Results

Dense Correspondence

Object Swapping

Object Swapping (with refinement process)

Links

Environment Setup

To install the required dependencies, use the following commands:

conda create -n sd-dino python=3.9
conda activate sd-dino
conda install pytorch=1.13.1 torchvision=0.14.1 pytorch-cuda=11.6 -c pytorch -c nvidia
conda install -c "nvidia/label/cuda-11.6.1" libcusolver-dev
git clone [email protected]:Junyi42/sd-dino.git 
cd sd-dino
pip install -e .

(Optional) You may also want to install xformers for efficient transformer implementation:

pip install xformers==0.0.16

Get Started

Prepare the data

We provide the scripts to download the datasets in the data folder. To download specific datasets, use the following commands:

  • SPair-71k:
bash data/prepare_spair.sh
  • PF-Pascal:
bash data/prepare_pfpascal.sh
  • TSS:
bash data/prepare_tss.sh

Evaluate the PCK Results of SPair-71k

Run pck_spair_pascal.py file:

python pck_spair_pascal.py --SAMPLE 20

Note that the SAMPLE is the number of sampled pairs for each category, which is set to 20 by default. Set to 0 to use all the samples (settings in the paper).

Additional important parameters in pck_spair_pascal.py include:

  • --NOT_FUSE: if set to True, only use the SD feature.
  • --ONLY_DINO: if set to True, only use the DINO feature.
  • --DRAW_DENSE: if set to True, draw the dense correspondence map.
  • --DRAW_SWAP: if set to True, draw the object swapping result.
  • --DRAW_GIF: if set to True, draw the object swapping result as a gif.
  • --TOTAL_SAVE_RESULT: number of samples to save the qualitative results, set to 0 to disable and accelerate the evaluation process.

Please refer to the pck_spair_pascal.py file for more details. You may find samples of qualitative results in the results_spair folder.

Evaluate the PCK Results of PF-Pascal

Run pck_spair_pascal.py file:

python pck_spair_pascal.py --PASCAL

You may find samples of qualitative results in the results_pascal folder.

Evaluate the PCK Results of TSS

Run pck_tss.py file:

python pck_tss.py

You may find samples of qualitative results in the results_tss folder.

Demo

PCA / K-means Visualization of the Features

To extract the fused features of the input pair images and visualize the correspondence, please check the notebook demo_vis_features.ipynb for more details.

Quick Try on the Object Swapping

To swap the objects in the input pair images, please check the notebook demo_swap.ipynb for more details.

Refine the Result

TODO

Citation

If you find our work useful, please cite:

@article{zhang2023tale,
  title={{A Tale of Two Features: Stable Diffusion Complements DINO for Zero-Shot Semantic Correspondence}},
  author={Zhang, Junyi and Herrmann, Charles and Hur, Junhwa and Cabrera, Luisa Polania and Jampani, Varun and Sun, Deqing and Yang, Ming-Hsuan},
  journal={arXiv preprint arxiv:2305.15347},
  year={2023}
}

Acknowledgement

Our code is largely based on the following open-source projects: ODISE, dino-vit-features (official implementation), dino-vit-features (Kamal Gupta's implementation), DenseMatching, and ncnet. Our heartfelt gratitude goes to the developers of these resources!

sd-dino's People

Contributors

junyi42 avatar

Stargazers

Mohamed Ghanem avatar Yabo Zhang avatar  avatar  avatar Bao Li avatar kiteday avatar Emerson Segura avatar  avatar Hejia Zhang avatar ZMX avatar Rui Zhang avatar xyz avatar  avatar Nashihikari avatar Weihang Li avatar Yang Ruofeng avatar  avatar ZhijingWan avatar leo avatar  avatar  avatar Hailongn Jin avatar Taehoon Kim avatar Zhang Kaiyan avatar Ng Kam Woh avatar  avatar Wenxiao Zhang avatar Lixin YANG (杨理欣) avatar Pengxiang Zhu avatar  avatar kelvin34501 avatar Jingze avatar  avatar  avatar  avatar 林海涛 | Haitao Lin avatar  avatar fighting! avatar Lévy飞鸟 avatar Kilian Kliegel avatar MinkwanKim avatar jjunsss avatar  avatar Chenxin Li avatar  avatar Zwette avatar Haoyi Zhu avatar Lorenz Linhardt avatar Haoyu Zhen avatar Dengzhi avatar  avatar Mingrui Zhu avatar  avatar BigDream avatar Fazeng Li avatar Dimitrije Antic avatar Gen Li avatar Octave avatar Kaining Zhang avatar  avatar Shubhang Bhatnagar avatar lincent avatar  avatar Zizhang Li avatar Chief Accelerator avatar develop avatar Jongwoo Choi avatar Jing He avatar Jie X avatar Jia-Chang Feng avatar Yunhan Yang avatar Tao Hu avatar Daniela Ivanova avatar Duc Anh (Aengus) N. avatar Xi CHEN avatar weilunhuang@jhu avatar Jeff Carpenter avatar Jinqiaoqiao avatar Will Sanger avatar Shi Yan avatar SpyderZSY avatar WenZhou Lyu avatar Adheesh Juvekar avatar Khanh-Toan Nguyen avatar Pattaramanee Arsomngern avatar Rocky avatar Yuxuan Xue avatar YuhaoLiu avatar David Marx avatar Chenyang LEI avatar  avatar Puhua Jiang avatar Mark Jinyang Liu avatar Tianxing Wu avatar Yuekun Dai avatar Xingyu Liu avatar Matt Shaffer avatar zhuxiangyang avatar Chenguo Lin avatar  avatar

Watchers

Kostas Georgiou avatar JingfanChen avatar LI Wentong avatar hiyyg avatar Matt Shaffer avatar  avatar

sd-dino's Issues

Details about how to extract sd features

Hi Junyi,

I am confused about how to extract sd features. Actually the file extractor_sd.py seems to output a feature in shape of [1, 1280, 16, 16] without obvious semantic information. And it seems to use the model weights from project ODISE. Could you please provide a script to easily extract and visualize the sd features using publicly available stable diffusion model weights? Thanks a lot!

image

Model parameter mismatch

Hi, thanks for sharing the codes.

I found a problem when running the demo codes. I followed all the setup in readme without changing anything, but it seems the download pre-trained weights mismatch the model:

image

so I got the results which are very different from yours:
image

This problem also occurs when I run Geoaware-SC. Could you give me some advice on how to solve this?

License?

Hi,

Thanks for this awesome work! 🤩

DINO and StableDiffusion works have MIT licenses. Is your work also MIT?

Best,
Iago.

Establish environment

Hello, I am very interested in your work, but I encountered some difficulties when setting up the environment. I followed the steps in the README, but there seems to be some problem somewhere, and I don't know how to fix it.
image

Installation issues for Mask Former

Hello @Junyi42 ,
Thanks for your contribution. I am facing the an installation issue when running the "pip install -e ." command. This is giving the error as follows:

Emitting ninja build file /BS/keytr_neus/work/supplementary/sd-dino/third_party/Mask2Former/build/temp.linux-x86_64-cpython-39/build.ninja...

error: [Errno 2] No such file or directory: '/BS/keytr_neus/work/supplementary/sd-dino/third_party/Mask2Former/build/temp.linux-x86_64-cpython-39/build.ninja'

ERROR: Failed building wheel for mask2former

ERROR: Could not build wheels for mask2former, which is required to install pyproject.toml-based projects

Please help me in this

get_mask cannot return valid mask

Hi!
when running the demo,

src_img_path = "data/images/dog_00.jpg"
trg_img_path = "data/images/dog_59.jpg"
result = process_images(src_img_path, trg_img_path)

I found that the get_mask function cannot return a valid mask but an all-1 matrix. Is this a bug?

if DRAW_DENSE:
                if not Anno:
                    mask1 = get_mask(model, aug, img1, category[0])
                    mask2 = get_mask(model, aug, img2, category[-1])

AttributeError: module 'keras.backend' has no attribute 'is_tensor'

Hello,I'm sorry to bother you again. I've encountered a version issue. My TensorFlow and Keras versions are 2.13.1, and I'm getting this error. Could you please let me know the Keras version requirements for this code? I couldn't find any helpful answers online, and despite using a global search, I haven't found any occurrences of the "is_tensor" function in the code.
Thanks!

Collab Demo

Thank you for the amazing work! I am trying to visualize the feature maps for dino and SD. Do you have a collab notebook, that I can use to run it?

cannot `get_mask` when I vary the cuda device

Hello Junyi, GREAT JOB! It seems that everything works well when calling get_features in extractor_sd.py using cuda:3
but the inference process failed even I change
def inference(model, aug, image, vocab, label_list):
from
demo = StableDiffusionSeg(inference_model, demo_metadata, aug)

pred = demo.predict(np.array(image))
to
demo = StableDiffusionSeg(inference_model, demo_metadata, aug)

demo.model = demo.model.to(torch.device("cuda:3"))

pred = demo.predict(np.array(image))

I guess the main problem lies in wrongly loading the decoder part of the model, but I'm not sure how to fix it.

Result different from demo_vis_features.ipynb

Hello @Junyi42 , Thanks for your contribution. I ran the "demo_vis_features.ipynb on the dog that was given in the default image folder. My results are coming different than yours. Yours masked pca result was

image

while I am getting
image

Also, my clustering is

clustering

I didn't change anything in the code only dumped everything from the ipynb to .py file and I am getting these outputs in the results_vis folder in the form of png files.

Questions about sd features

Hello, I would like to know whether the 2, 5, 8-layer features mentioned in the paper refer to the actual 2, 5, 8 layers or the layers after processing with the UpSample block. Does it mean the results obtained after the UpSample block processing? I find it a bit challenging to understand the feature extraction in the code. I hope to receive your reply. Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.