Sketch_LVM

Official repository of CLIP for All Things Zero-Shot Sketch-Based Image Retrieval, Fine-Grained or Not

CVPR 2023

Abstract

In this paper, we leverage CLIP for zero-shot sketch based image retrieval (ZS-SBIR). We are largely inspired by recent advances on foundation models and the unparalleled generalisation ability they seem to offer, but for the first time tailor it to benefit the sketch community. We put forward novel designs on how best to achieve this synergy, for both the category setting and the fine-grained setting ("all"). At the very core of our solution is a prompt learning setup. First we show just via factoring in sketch-specific prompts, we already have a category-level ZS-SBIR system that overshoots all prior arts, by a large margin (24.8%) - a great testimony on studying the CLIP and ZS-SBIR synergy. Moving onto the fine-grained setup is however trickier, and requires a deeper dive into this synergy. For that, we come up with two specific designs to tackle the fine-grained matching nature of the problem: (i) an additional regularisation loss to ensure the relative separation between sketches and photos is uniform across categories, which is not the case for the gold standard standalone triplet loss, and (ii) a clever patch shuffling technique to help establishing instance-level structural correspondences between sketch-photo pairs. With these designs, we again observe significant performance gains in the region of 26.9% over previous state-of-the-art. The take-home message, if any, is the proposed CLIP and prompt learning paradigm carries great promise in tackling other sketch-related tasks (not limited to ZS-SBIR) where data scarcity remains a great challenge.

Architecture

Cross-category FG-ZS-SBIR. A common (photo-sketch) learnable visual prompt shared across categories is trained using CLIP’s image encoder over three losses as shown. CLIP’s text-encoder based classification loss is used during training.

Datasets

For ZS-SBIR:
- Sketchy (extended).
- TUBerlin.
- QuickDraw (a smaller version).
For Fine-grained ZS-SBIR:
- Sketchy (basic) dataset having fine-grained sketch-photo associations.

Code

A workable basic version of the code for CLIP adapted for ZS-SBIR has been uploaded.

src folder holds the source files.
experiments folder holds the executable wrapper for the model with particular specifications.

An example command to run the code is given below:

$ cd Sketch_LVM
$ python -m experiments.LN_prompt --exp_name=LN_prompt --n_prompts=3 --clip_LN_lr=1e-6 --prompt_lr=1e-4 --batch_size=192 --workers=128 --model_type=two_encoder

Qualitative Results

Qualitative results of ZS-SBIR on Sketchy by a baseline (blue) method vs Ours (green).

Qualitative results of FG-ZS-SBIR on Sketchy by a baseline (blue) method vs Ours (green). The images are arranged in increasing order of the ranks beside their corresponding sketch-query, i.e the left-most image was retrieved at rank-1 for every category. The true-match for every query, if appearing in top-5 is marked in a green frame. Numbers denote the rank at which that true-match is retrieved for every corresponding sketch-query.

Quantitative Results

Quantitative results of our method against a few SOTAs.

The code for cross-category Fine-Grained ZS-SBIR will be uploaded in some time.

Code for SBIR, mAP and for image generation

For getting the retrieval results, we can use the next code:

python -m SBIR --model_type=two_encoder --model=last.ckpt --output_file=name_of_output_file

This code will generate two files in the results folder:

name_of_output_file.txt with the retrieval results.
name_of_output_file_images.txt with the images of the retrieval results.

For getting the mAP, we can use the next code:

python mAP.py

This code will show the mAP for the retrieval results in the results folder.

Finally, for generating the images of the retrieval results, we can use the next code:

python -m generate_images --output_file=name_of_output_file

This code will generate the images of the retrieval results in the save_images folder.

When the data set is in a format like Sketchy (folders photo and sketch), we should leave --x_train and --x_test empty.

Bibtex

Please cite our work if you found it useful. Thanks.

@Inproceedings{sain2023clip,
  title={{CLIP for All Things Zero-Shot Sketch-Based Image Retrieval, Fine-Grained or Not}},
  author={Aneeshan Sain and Ayan Kumar Bhunia and Pinaki Nath Chowdhury and Subhadeep Koley and Tao Xiang and Yi-Zhe Song},
  booktitle={CVPR},
  year={2023}
}

chstr / sketch_lvm Goto Github PK

sketch_lvm's Introduction

Sketch_LVM

Official repository of CLIP for All Things Zero-Shot Sketch-Based Image Retrieval, Fine-Grained or Not

CVPR 2023

Abstract

Architecture

Datasets

Code

Qualitative Results

Quantitative Results

Code for SBIR, mAP and for image generation

Bibtex

sketch_lvm's People

Contributors

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs