GithubHelp home page GithubHelp logo

anminhhung / composed_image_retrieval Goto Github PK

View Code? Open in Web Editor NEW

This project forked from google-research/composed_image_retrieval

0.0 0.0 0.0 1.78 MB

License: Apache License 2.0

Shell 93.36% Python 6.64%

composed_image_retrieval's Introduction

Pic2Word (CVPR2023)

This is an open source implementation of Pic2Word. This is not an officially supported Google product.

Data

Training Data

We utilize Conceptual Captions URLs to train a model. See open_clip to see the process of getting the dataset.

The training data directory has to be in the root of this repo, and should be structured like below.

  cc_data
    ├── train ## training image diretories.
    └── val ## validation image directories.
  cc
    ├── Train_GCC-training_output.csv ## training data list
    └── Validation_GCC-1.1.0-Validation_output.csv ## validation data list

Test Data

See README to prepare test dataset.

Training

Install dependencies

See open_clip for the details of installation. The same environment should be usable in this repo. setenv.sh is the script we used to set-up the environment in virtualenv.

Also run below to add directory to pythonpath:

. env3/bin/activate
export PYTHONPATH="$PYTHONPATH:$PWD/src"
export PYTHONWARNINGS='ignore:semaphore_tracker:UserWarning'

Pre-trained model

The model is available in GoogleDrive.

Sample running code for training:

python -u src/main.py \
    --save-frequency 1 \
    --train-data="cc/Train_GCC-training_output.csv"  \
    --warmup 10000 \
    --batch-size=128 \
    --lr=1e-4 \
    --wd=0.1 \
    --epochs=30 \
    --workers=8 \
    --openai-pretrained \
    --model ViT-L/14

Sample evaluation only:

Evaluation on COCO, ImageNet, or CIRR.

python src/eval_retrieval.py \
    --openai-pretrained \
    --resume /path/to/checkpoints \
    --eval-mode $data_name \ ## replace with coco, imgnet, or cirr
    --gpu $gpu_id
    --model ViT-L/14

Evaluation on fashion-iq (shirt or dress or toptee)

python src/eval_retrieval.py \
    --openai-pretrained \
    --resume /path/to/checkpoints \
    --eval-mode fashion \
    --source $cloth_type \ ## replace with shirt or dress or toptee
    --gpu $gpu_id
    --model ViT-L/14

Demo:

Evaluation on COCO, ImageNet, or CIRR.

python src/demo.py \
    --openai-pretrained \
    --resume /path/to/checkpoints \
    --retrieval-data $data_name \ ## Choose from coco, imgnet, cirr, dress, shirt, toptee.
    --query_file "path_img1,path_img2,path_img3..." \ ## query images
    --prompts "prompt1,prompt2,..." \ #prompts. Use * to indicate the token to be replaced with an image token. e.g., "a sketch of *"
    --demo-out $path_demo \ # directory to generate html file and image directory.
    --gpu $gpu_id
    --model ViT-L/14

This demo will generate a directory which includes html file and an image directory. Download the directory and open html to see results.

Citing

If you found this repository useful, please consider citing:

@article{saito2023pic2word,
  title={Pic2Word: Mapping Pictures to Words for Zero-shot Composed Image Retrieval},
  author={Saito, Kuniaki and Sohn, Kihyuk and Zhang, Xiang and Li, Chun-Liang and Lee, Chen-Yu and Saenko, Kate and Pfister, Tomas},
  journal={CVPR},
  year={2023}
}

composed_image_retrieval's People

Contributors

ksaito-ut avatar kihyuks-google avatar dependabot[bot] avatar kihyuks avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.