GithubHelp home page GithubHelp logo

emasa / lincir Goto Github PK

View Code? Open in Web Editor NEW

This project forked from navervision/lincir

0.0 0.0 0.0 1.72 MB

Official Pytorch implementation of LinCIR: Language-only Training of Zero-shot Composed Image Retrieval (CVPR 2024)

License: Other

Python 100.00%

lincir's Introduction

LinCIR: Language-only Training of Zero-shot Composed Image Retrieval (CVPR 2024)

arXiv demo

PWC
PWC
PWC

Welcome to the official Pytorch implementation of LinCIR!

Discover the magic of LinCIR, a ground-breaking approach to Composed Image Retrieval (CIR) that challenges convention and ushers in a new era of AI research. Dive into the limitless possibilities of zero-shot composed image retrieval with us!

Authors:

Geonmo Gu*1, Sanghyuk Chun*2, Wonjae Kim2, Yoohoon Kang1, Sangdoo Yun2

1 NAVER Vision 2 NAVER AI Lab

* First two authors contributed equally.

โญ Overview

The Composed Image Retrieval (CIR) task, a fusion of image and text, has always been an intriguing challenge for AI researchers. Traditional CIR methods require expensive triplets of query image, query text, and target image for training, limiting scalability.

Enter LinCIR, a revolutionary CIR framework that relies solely on language for training. Our innovative approach leverages self-supervision through self-masking projection (SMP), allowing LinCIR to be trained using text datasets alone.

With LinCIR, we achieve astonishing efficiency and effectiveness. For instance, LinCIR with a CLIP ViT-G backbone is trained in just 48 minutes and outperforms existing methods in zero-shot composed image retrieval on four benchmark datasets: CIRCO, GeneCIS, FashionIQ, and CIRR. In fact, it even surpasses supervised methods on FashionIQ!

๐Ÿš€ News

  • February 27, 2024 - LinCIR is accepted at CVPR 2024!
  • December 5, 2023 - LinCIR is officially released!

๐Ÿ› ๏ธ Installation

Get started with LinCIR by installing the necessary dependencies:

$ pip install torch transformers diffusers accelerate datasets spacy
$ python -m spacy download en_core_web_sm

๐Ÿค— Demo

If you want to run and execute the demo directly, you can do so by running the script below.

Of course, you can also experience the demo directly on the Huggingface Space.

$ pip intall clip-retrieval

$ python demo.py

Demo will be hosted at https://0.0.0.0:8000

๐Ÿ“‚ Dataset Preparation

No need to worry about downloading training datasets manually. All training datasets are automatically fetched using the Hugging Face datasets library.

Keep in mind that the training datasets are considerably smaller in volume compared to (image, caption) pairs or triplet datasets like FashionIQ and CIRR.

Please refer to here to prepare the benchmark datasets.

๐Ÿ“š How to Train LinCIR

Train LinCIR with ease using the following command:

$ python -m torch.distributed.run --nproc_per_node 8 --nnodes 1 --node_rank 0 \
--master_addr localhost --master_port 5100 train_phi.py \
--batch_size 64 \
--output_dir /path/to/your_experiment \
--cirr_dataset_path /path/to/cir_datasets/CIRR \
--mixed_precision fp16 \
--clip_model_name large \
--validation_steps 1000 \
--checkpointing_steps 1000 \
--seed 12345 \
--lr_scheduler constant_with_warmup --lr_warmup_steps 0 \
--max_train_steps 20000

If you have a powerful GPU machine with 8 GPUs, simply run the above script. For less powerful GPU machine with single GPU, set --nuproc_per_node to 1 and adjust --batch_size to 256 or 512. Rest assured, the results will be consistent.

If you'd like to use ViT-Large, Huge or Giga as CLIP backbone, change --clip_model_name to large, huge, or giga each.

๐Ÿ’ฏ How to Evaluate LinCIR

CIRR (Test Set)

Evaluate LinCIR on the CIRR test set with the following command:

$ python generate_test_submission.py \
--eval-type phi \
--dataset cirr \
--dataset-path /path/to/CIRR \
--phi-checkpoint-name /path/to/trained_your/phi_best.pt \
--clip_model_name large \
--submission-name lincir_results

Retrieved results will be saved as:

  • ./submission/cirr/{submission-name}.json
  • ./submission/cirr/subset_{submission-name}.json

Upload these files here to view the results.

CIRR (Validation Set, Dev)

For the CIRR validation set, use the following command:

$ python validate.py \
--eval-type phi \
--dataset cirr \
--dataset-path /path/to/CIRR \
--phi-checkpoint-name /path/to/trained_your/phi_best.pt \
--clip_model_name large

FashionIQ

To evaluate LinCIR on FashionIQ, run the following command:

$ python validate.py \
--eval-type phi \
--dataset fashioniq \
--dataset-path /path/to/fashioniq \
--phi-checkpoint-name /path/to/trained_your/phi_best.pt \
--clip_model_name large

CIRCO

Evaluate LinCIR on the CIRCO dataset with the command below:

$ python generate_test_submission.py \
--eval-type phi \
--dataset circo \
--dataset-path /path/to/cir_datasets/CIRCO \
--phi-checkpoint-name /path/to/trained_your/phi_best.pt \
--clip_model_name large \
--submission-name lincir_results

Retrieved results will be saved as:

  • ./submission/circo/{submission-name}.json
  • ./submission/circo/subset_{submission-name}.json

Upload these files here to view the results.

GeneCIS

Evaluating GeneCIS requires a few additional steps. Run the following script:

You can get VG_100K_all and COCO_val2017 at GeneCIS.

# Assuming you're in the lincir folder.
$ git fetch --all
$ git checkout eval_genecis
$ cd genecis
$ python evaluate.py \
--combiner_mode phi \
--model large \
--combiner_pretrain_path /path/to/lincir_best.pt \
--vg_100k_all_path /path/to/VG_100K_all \
--coco_val2017_path /path/to/val2017

Acknowledgement

We would like to express our special gratitude to the authors of SEARLE for their invaluable contributions, as our code draws significant inspiration from this open-source project.

Citation

@inproceedings{gu2024lincir,
    title={Language-only Efficient Training of Zero-shot Composed Image Retrieval},
    author={Gu, Geonmo and Chun, Sanghyuk and Kim, Wonjae and and Kang, Yoohoon and Yun, Sangdoo},
    year={2024},
    booktitle={Conference on Computer Vision and Pattern Recognition (CVPR)},
}

License

Licensed under CC BY-NC 4.0

LinCIR
Copyright (c) 2023-present NAVER Corp.
CC BY-NC-4.0 (https://creativecommons.org/licenses/by-nc/4.0/)

lincir's People

Contributors

geonm avatar sanghyukchun avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.