GithubHelp home page GithubHelp logo

huichen24 / imram Goto Github PK

View Code? Open in Web Editor NEW
90.0 90.0 29.0 309 KB

code for our CVPR2020 paper "IMRAM: Iterative Matching with Recurrent Attention Memory for Cross-Modal Image-Text Retrieval"

Python 96.54% Shell 3.46%

imram's People

Contributors

huichen24 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

imram's Issues

Attention visualization

Hi Hui, appreciate for your opening source and efforts.
I was wondering that how do you visualize the attention weights as reported in your paper. Because the initial data from SCAN has no region label information (x, y, w, h).

Question about mask

Hi Hui, thanks for your open-sourcing and appreciate your efforts on it!
There is a mask in the data.py and model.py. What role does the mask play here? Thank you very much.

the npy file

Your work is rather good!but I have some question about 'Image features for training set, validation set and testing set should be merged in order into one .npy file',how to merged in order into one file?

Some training details

Hi Hui, thanks for your open-sourcing and appreciate your efforts on it!

So I've tried to reproduce the best results you got on Flickr 30k while yet the best results I got based on your implementation is:
rsum: 477.0
Average i2t Recall: 86.0
Image to text: 70.5 91.2 96.3 1.0 2.7
Average t2i Recall: 73.0
Text to image: 53.1 79.6 86.3 1.0 9.2

while has a slightly different on the I2t benchmarks. I guess this might be caused by some training hyper-parameters difference so I'd like to check with you on that.

RN I'm training it on a 4-GPU machine with Batch-size to be 64, lr=2e-4 for 30 epochs, is there any specific parameters for Flickr 30k training to get the best 74.1 I2T task? Appreciate your help very much.

Yours,
Zhiyuan

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.