huichen24 / imram Goto Github PK
View Code? Open in Web Editor NEWcode for our CVPR2020 paper "IMRAM: Iterative Matching with Recurrent Attention Memory for Cross-Modal Image-Text Retrieval"
code for our CVPR2020 paper "IMRAM: Iterative Matching with Recurrent Attention Memory for Cross-Modal Image-Text Retrieval"
Hi Hui, thanks for your open-sourcing and appreciate your efforts on it!
I failed to run the code on colab with 16G GPU Memory, so I want to know that how much GPU Memory I have at least to run it
Hi Hui, appreciate for your opening source and efforts.
I was wondering that how do you visualize the attention weights as reported in your paper. Because the initial data from SCAN has no region label information (x, y, w, h).
Hi Hui, thanks for your open-sourcing and appreciate your efforts on it!
There is a mask in the data.py
and model.py
. What role does the mask play here? Thank you very much.
Your work is rather good!but I have some question about 'Image features for training set, validation set and testing set should be merged in order into one .npy file',how to merged in order into one file?
Hello, I find some problem in dataset. "https://scanproject.blob.core.windows.net/scan-data/data_no_feature.zip", the link is invalid, can you provide it? Thanks a lot.
Hi,
The website https://scanproject.blob.core.windows.net you shared in the data preparation section cannot be reached. I found another link https://iudata.blob.core.windows.net/scan/data_no_feature.zip for this data from SCAN project GitHub, but It doesn't have the numpy file of features.
Could you share an updated link that contains the correct data?
Hi Hui, thanks for your open-sourcing and appreciate your efforts on it!
So I've tried to reproduce the best results you got on Flickr 30k while yet the best results I got based on your implementation is:
rsum: 477.0
Average i2t Recall: 86.0
Image to text: 70.5 91.2 96.3 1.0 2.7
Average t2i Recall: 73.0
Text to image: 53.1 79.6 86.3 1.0 9.2
while has a slightly different on the I2t benchmarks. I guess this might be caused by some training hyper-parameters difference so I'd like to check with you on that.
RN I'm training it on a 4-GPU machine with Batch-size to be 64, lr=2e-4 for 30 epochs, is there any specific parameters for Flickr 30k training to get the best 74.1 I2T task? Appreciate your help very much.
Yours,
Zhiyuan
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.