GithubHelp home page GithubHelp logo

chstr / sketch_lvm Goto Github PK

View Code? Open in Web Editor NEW

This project forked from aneeshan95/sketch_lvm

0.0 0.0 1.0 18.2 MB

Project page for the paper 'CLIP for All Things Zero-Shot Sketch-Based Image Retrieval, Fine-Grained or Not'

License: MIT License

JavaScript 54.06% Python 31.71% CSS 3.70% HTML 10.53%

sketch_lvm's Introduction

Sketch_LVM

Official repository of CLIP for All Things Zero-Shot Sketch-Based Image Retrieval, Fine-Grained or Not

CVPR 2023

paper supplement video Project Page

Abstract

teaser

In this paper, we leverage CLIP for zero-shot sketch based image retrieval (ZS-SBIR). We are largely inspired by recent advances on foundation models and the unparalleled generalisation ability they seem to offer, but for the first time tailor it to benefit the sketch community. We put forward novel designs on how best to achieve this synergy, for both the category setting and the fine-grained setting ("all"). At the very core of our solution is a prompt learning setup. First we show just via factoring in sketch-specific prompts, we already have a category-level ZS-SBIR system that overshoots all prior arts, by a large margin (24.8%) - a great testimony on studying the CLIP and ZS-SBIR synergy. Moving onto the fine-grained setup is however trickier, and requires a deeper dive into this synergy. For that, we come up with two specific designs to tackle the fine-grained matching nature of the problem: (i) an additional regularisation loss to ensure the relative separation between sketches and photos is uniform across categories, which is not the case for the gold standard standalone triplet loss, and (ii) a clever patch shuffling technique to help establishing instance-level structural correspondences between sketch-photo pairs. With these designs, we again observe significant performance gains in the region of 26.9% over previous state-of-the-art. The take-home message, if any, is the proposed CLIP and prompt learning paradigm carries great promise in tackling other sketch-related tasks (not limited to ZS-SBIR) where data scarcity remains a great challenge.

Architecture

Cross-category FG-ZS-SBIR. A common (photo-sketch) learnable visual prompt shared across categories is trained using CLIP’s image encoder over three losses as shown. CLIP’s text-encoder based classification loss is used during training.

arch

Datasets

  • For ZS-SBIR:
  • For Fine-grained ZS-SBIR:
    • Sketchy (basic) dataset having fine-grained sketch-photo associations.

Code

A workable basic version of the code for CLIP adapted for ZS-SBIR has been uploaded.

  • src folder holds the source files.
  • experiments folder holds the executable wrapper for the model with particular specifications.

An example command to run the code is given below:

$ cd Sketch_LVM
$ python -m experiments.LN_prompt --exp_name=LN_prompt --n_prompts=3 --clip_LN_lr=1e-6 --prompt_lr=1e-4 --batch_size=192 --workers=128 --model_type=two_encoder

Qualitative Results

Qualitative results of ZS-SBIR on Sketchy by a baseline (blue) method vs Ours (green). qualitative_category

Qualitative results of FG-ZS-SBIR on Sketchy by a baseline (blue) method vs Ours (green). The images are arranged in increasing order of the ranks beside their corresponding sketch-query, i.e the left-most image was retrieved at rank-1 for every category. The true-match for every query, if appearing in top-5 is marked in a green frame. Numbers denote the rank at which that true-match is retrieved for every corresponding sketch-query. qualitative_FG

Quantitative Results

Quantitative results of our method against a few SOTAs. qualitative_FG

The code for cross-category Fine-Grained ZS-SBIR will be uploaded in some time.

Code for SBIR, mAP and for image generation

For getting the retrieval results, we can use the next code:

python -m SBIR --model_type=two_encoder --model=last.ckpt --output_file=name_of_output_file

This code will generate two files in the results folder:

  • name_of_output_file.txt with the retrieval results.
  • name_of_output_file_images.txt with the images of the retrieval results.

For getting the mAP, we can use the next code:

python mAP.py 

This code will show the mAP for the retrieval results in the results folder.

Finally, for generating the images of the retrieval results, we can use the next code:

python -m generate_images --output_file=name_of_output_file

This code will generate the images of the retrieval results in the save_images folder.

When the data set is in a format like Sketchy (folders photo and sketch), we should leave --x_train and --x_test empty.

Bibtex

Please cite our work if you found it useful. Thanks.

@Inproceedings{sain2023clip,
  title={{CLIP for All Things Zero-Shot Sketch-Based Image Retrieval, Fine-Grained or Not}},
  author={Aneeshan Sain and Ayan Kumar Bhunia and Pinaki Nath Chowdhury and Subhadeep Koley and Tao Xiang and Yi-Zhe Song},
  booktitle={CVPR},
  year={2023}
}

sketch_lvm's People

Contributors

aneeshan95 avatar chstr avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.