GithubHelp home page GithubHelp logo

goodgood123 / retrieval.pytorch Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jwehrmann/retrieval.pytorch

0.0 1.0 0.0 50.58 MB

Adaptive Cross-Modal Embeddings for Image-Sentence Alignment

Python 100.00%

retrieval.pytorch's Introduction

Adaptive Cross-modal Embeddings for Image-Text Alignment (ADAPT)

This code implements a novel approach for training image-text alignment models, namely ADAPT.

ADAPT is designed to adjust an intermediate representation of instances from a modality a using an embedding vector of an instance from modality b. Such an adaptation is designed to filter and enhance important information across internal features, allowing for guided vector representations โ€“ which resembles the working of attention modules, though far more computationally efficient. For further information, please read our AAAI 2020 paper.

Table of Contents

Installation

We don't provide support for python 2. We advise you to install python 3 with Anaconda and then create an environment.

2. As standalone project

conda create --name adapt python=3
conda activate adapt
git clone https://github.com/jwehrmann/retrieval.pytorch
cd retrieval.pytorch
pip install -r requirements.txt

3. Download datasets

wget https://scanproject.blob.core.windows.net/scan-data/data.zip

Quick start

  • Option 1:
conda activate adapt
export DATA_PATH=/path/to/dataset
  • Option 2:

You can also create a shell alias (shortcut to reference a command). For example, add this command to your shell profile:

alias adapt='source activate adapt && export DATA_PATH=/path/to/dataset' 

And then only run the declared name of the alias to have everything configured:

$ adapt

Training Models

You can reproduce our main results using the following scripts.

  • Training on Flickr30k:
python run.py options/adapt/f30k/t2i.yaml
python test.py options/adapt/f30k/t2i.yaml -data_split test
python run.py options/adapt/f30k/i2t.yaml
python test.py options/adapt/f30k/i2t.yaml -data_split test
  • Training on MS COCO:
python run.py options/adapt/coco/t2i.yaml
python test.py options/adapt/coco/t2i.yaml -data_split test
python run.py options/adapt/coco/i2t.yaml
python test.py options/adapt/coco/i2t.yaml -data_split test

To ensemble multiple models (ADAPT-Ens) one can use:

  • MS COCO models:
python test_ens.py options/adapt/coco/t2i.yaml options/adapt/coco/i2t.yaml -data_split test
  • F30k models:
python test_ens.py options/adapt/f30k/t2i.yaml options/adapt/f30k/i2t.yaml -data_split test

Pre-trained models

We make available all the main models generated in this research. Each file has the best model of the run (according to validation result), the last checkpoint generated, all tensorboard logs (loss and recall curves), result files, and configuration options used for training.

Dataset Model Image Annotation R@1 Image Retrieval R@1
F30k ADAPT-t2i 76.4% 57.8%
F30k ADAPT-i2t 66.3% 53.8%
F30k ADAPT-ens 76.2% 60.5%
COCO ADAPT-t2i 75.4% 64.0%
COCO ADAPT-i2t 67.2% 57.8%
COCO ADAPT-ens 75.3% 64.4%

Citation

If you find this research or code useful, please consider citing our paper:

@article{wehrmanna2020daptive,
  title={Adaptive Cross-modal Embeddings for Image-Text Alignment},
  author={Wehrmann, J{\^o}natas and Kolling, Camila and Barros, Rodrigo C},
  booktitle={The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2020)},
  year={2020}
}

retrieval.pytorch's People

Contributors

jwehrmann avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.