junyi42 / sd-dino Goto Github PK

Official Implementation of paper "A Tale of Two Features: Stable Diffusion Complements DINO for Zero-Shot Semantic Correspondence"

Home Page: https://sd-complements-dino.github.io

Shell 0.04% Jupyter Notebook 91.50% Python 8.46%

sd-dino's Introduction

A Tale of Two Features: Stable Diffusion Complements DINO for Zero-Shot Semantic Correspondence

A Tale of Two Features explores the complementary nature of Stable Diffusion (SD) and DINOv2 features for zero-shot semantic correspondence. The results demonstrate that a simple fusion of the two features leads to state-of-the-art performance on the SPair-71k, PF-Pascal, and TSS datasets.

This repository is the official implementation of the paper:

A Tale of Two Features: Stable Diffusion Complements DINO for Zero-Shot Semantic Correspondence Junyi Zhang, Charles Herrmann, Junhwa Hur, Luisa F. Polanía, Varun Jampani, Deqing Sun, Ming-Hsuan Yang NeurIPS, 2023.

[New!] We have released the code for Telling Left from Right, a follow-up with better semantic correspondence.

Visual Results

Dense Correspondence

Object Swapping

Object Swapping (with refinement process)

Environment Setup

To install the required dependencies, use the following commands:

conda create -n sd-dino python=3.9
conda activate sd-dino
conda install pytorch=1.13.1 torchvision=0.14.1 pytorch-cuda=11.6 -c pytorch -c nvidia
conda install -c "nvidia/label/cuda-11.6.1" libcusolver-dev
git clone [email protected]:Junyi42/sd-dino.git 
cd sd-dino
pip install -e .

(Optional) You may also want to install xformers for efficient transformer implementation:

pip install xformers==0.0.16

Get Started

Prepare the data

We provide the scripts to download the datasets in the data folder. To download specific datasets, use the following commands:

SPair-71k:

bash data/prepare_spair.sh

PF-Pascal:

bash data/prepare_pfpascal.sh

TSS:

bash data/prepare_tss.sh

Evaluate the PCK Results of SPair-71k

Run pck_spair_pascal.py file:

python pck_spair_pascal.py --SAMPLE 20

Note that the SAMPLE is the number of sampled pairs for each category, which is set to 20 by default. Set to 0 to use all the samples (settings in the paper).

Additional important parameters in pck_spair_pascal.py include:

--NOT_FUSE: if set to True, only use the SD feature.
--ONLY_DINO: if set to True, only use the DINO feature.
--DRAW_DENSE: if set to True, draw the dense correspondence map.
--DRAW_SWAP: if set to True, draw the object swapping result.
--DRAW_GIF: if set to True, draw the object swapping result as a gif.
--TOTAL_SAVE_RESULT: number of samples to save the qualitative results, set to 0 to disable and accelerate the evaluation process.

Please refer to the pck_spair_pascal.py file for more details. You may find samples of qualitative results in the results_spair folder.

Evaluate the PCK Results of PF-Pascal

Run pck_spair_pascal.py file:

python pck_spair_pascal.py --PASCAL

You may find samples of qualitative results in the results_pascal folder.

Evaluate the PCK Results of TSS

Run pck_tss.py file:

python pck_tss.py

You may find samples of qualitative results in the results_tss folder.

Demo

PCA / K-means Visualization of the Features

To extract the fused features of the input pair images and visualize the correspondence, please check the notebook demo_vis_features.ipynb for more details.

Quick Try on the Object Swapping

To swap the objects in the input pair images, please check the notebook demo_swap.ipynb for more details.

Refine the Result

TODO

Citation

If you find our work useful, please cite:

@article{zhang2023tale,
  title={{A Tale of Two Features: Stable Diffusion Complements DINO for Zero-Shot Semantic Correspondence}},
  author={Zhang, Junyi and Herrmann, Charles and Hur, Junhwa and Cabrera, Luisa Polania and Jampani, Varun and Sun, Deqing and Yang, Ming-Hsuan},
  journal={arXiv preprint arxiv:2305.15347},
  year={2023}
}

Acknowledgement

Our code is largely based on the following open-source projects: ODISE, dino-vit-features (official implementation), dino-vit-features (Kamal Gupta's implementation), DenseMatching, and ncnet. Our heartfelt gratitude goes to the developers of these resources!

sd-dino's People

Contributors

Stargazers

Watchers

Forkers

vonhartz yousafe007 jackzhousz norangeeroli eagertofly odeb1 trungpx hiyyg yuhaoliu7456 paulie17 rndm-jpg simonschlaepfer

sd-dino's Issues

Result different from demo_vis_features.ipynb

Hello @Junyi42 , Thanks for your contribution. I ran the "demo_vis_features.ipynb on the dog that was given in the default image folder. My results are coming different than yours. Yours masked pca result was

while I am getting

Also, my clustering is

I didn't change anything in the code only dumped everything from the ipynb to .py file and I am getting these outputs in the results_vis folder in the form of png files.

Object Swapping (with refinement process)

Hi Junyi,
Could you please provide the code for Object Swapping (with refinement process), current object swapping result seems blur

The image size for extracting the Dinov2-pretrained-model

Hi,

Thanks for this awesome work! 🤩

The image resolution is 518 in the Dinov2-pretrained-model, why you can use the 840.

looking forward your reply.

Establish environment

Hello, I am very interested in your work, but I encountered some difficulties when setting up the environment. I followed the steps in the README, but there seems to be some problem somewhere, and I don't know how to fix it.

License?

Hi,

Thanks for this awesome work! 🤩

DINO and StableDiffusion works have MIT licenses. Is your work also MIT?

Best,
Iago.

Code for PCA

https://github.com/Junyi42/sd-dino/blob/48278d9c3c1cc2386ca08438d527a35dff902c9d/extractor_sd.py#L260C1-L269C100
I guess the dimensions of tensor are [N, HW, C], I don’t know why transformed_tensor here only transforms tensor[0] and discards other samples, so that new features will only have one sample features, i.e., [1, HW, C].

get_mask cannot return valid mask

Hi!
when running the demo,

src_img_path = "data/images/dog_00.jpg"
trg_img_path = "data/images/dog_59.jpg"
result = process_images(src_img_path, trg_img_path)

I found that the get_mask function cannot return a valid mask but an all-1 matrix. Is this a bug?

if DRAW_DENSE:
                if not Anno:
                    mask1 = get_mask(model, aug, img1, category[0])
                    mask2 = get_mask(model, aug, img2, category[-1])

Model parameter mismatch

Hi, thanks for sharing the codes.

I found a problem when running the demo codes. I followed all the setup in readme without changing anything, but it seems the download pre-trained weights mismatch the model:

so I got the results which are very different from yours:

This problem also occurs when I run Geoaware-SC. Could you give me some advice on how to solve this?

Questions about sd features

Hello, I would like to know whether the 2, 5, 8-layer features mentioned in the paper refer to the actual 2, 5, 8 layers or the layers after processing with the UpSample block. Does it mean the results obtained after the UpSample block processing? I find it a bit challenging to understand the feature extraction in the code. I hope to receive your reply. Thank you!

AttributeError: module 'keras.backend' has no attribute 'is_tensor'

Hello,I'm sorry to bother you again. I've encountered a version issue. My TensorFlow and Keras versions are 2.13.1, and I'm getting this error. Could you please let me know the Keras version requirements for this code? I couldn't find any helpful answers online, and despite using a global search, I haven't found any occurrences of the "is_tensor" function in the code.
Thanks!

Collab Demo

Thank you for the amazing work! I am trying to visualize the feature maps for dino and SD. Do you have a collab notebook, that I can use to run it?

ValueError: a must be greater than 0 unless no samples are taken

Hi, I met the problem when I run pck_spair_pascal.py
Would you mind to telling me how to fix the issue?
Thanks!

Details about how to extract sd features

Hi Junyi,

I am confused about how to extract sd features. Actually the file extractor_sd.py seems to output a feature in shape of [1, 1280, 16, 16] without obvious semantic information. And it seems to use the model weights from project ODISE. Could you please provide a script to easily extract and visualize the sd features using publicly available stable diffusion model weights? Thanks a lot!

how to fix the issue of 'RuntimeError: Panoptic/odise_label_coco_50e.py not available in Model Zoo!'

RuntimeError: Panoptic/odise_label_coco_50e.py not available in Model Zoo!
would you mind to telling me how to fix the issue.

Installation issues for Mask Former

Hello @Junyi42 ,
Thanks for your contribution. I am facing the an installation issue when running the "pip install -e ." command. This is giving the error as follows:

Emitting ninja build file /BS/keytr_neus/work/supplementary/sd-dino/third_party/Mask2Former/build/temp.linux-x86_64-cpython-39/build.ninja...

error: [Errno 2] No such file or directory: '/BS/keytr_neus/work/supplementary/sd-dino/third_party/Mask2Former/build/temp.linux-x86_64-cpython-39/build.ninja'

ERROR: Failed building wheel for mask2former

ERROR: Could not build wheels for mask2former, which is required to install pyproject.toml-based projects

Please help me in this

cannot `get_mask` when I vary the cuda device

Hello Junyi, GREAT JOB! It seems that everything works well when calling get_features in extractor_sd.py using cuda:3
but the inference process failed even I change
def inference(model, aug, image, vocab, label_list):
from
demo = StableDiffusionSeg(inference_model, demo_metadata, aug)

pred = demo.predict(np.array(image))
to
demo = StableDiffusionSeg(inference_model, demo_metadata, aug)

demo.model = demo.model.to(torch.device("cuda:3"))

pred = demo.predict(np.array(image))

I guess the main problem lies in wrongly loading the decoder part of the model, but I'm not sure how to fix it.

junyi42 / sd-dino Goto Github PK

sd-dino's Introduction

A Tale of Two Features: Stable Diffusion Complements DINO for Zero-Shot Semantic Correspondence

Visual Results

Dense Correspondence

Object Swapping

Object Swapping (with refinement process)

Links

Environment Setup

Get Started

Prepare the data

Evaluate the PCK Results of SPair-71k

Evaluate the PCK Results of PF-Pascal

Evaluate the PCK Results of TSS

Demo

PCA / K-means Visualization of the Features

Quick Try on the Object Swapping

Refine the Result

Citation

Acknowledgement

sd-dino's People

Contributors

Stargazers

Watchers

Forkers

sd-dino's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs