huichen24 / imram Goto Github PK

code for our CVPR2020 paper "IMRAM: Iterative Matching with Recurrent Attention Memory for Cross-Modal Image-Text Retrieval"

Python 96.54% Shell 3.46%

imram's People

Contributors

Stargazers

Watchers

imram's Issues

l1norm_d report errors

I have an error here

And I have a question, where is its implement

How much GPU Memory is required to run IMRAM (step=2, 3)?

Hi Hui, thanks for your open-sourcing and appreciate your efforts on it!

I failed to run the code on colab with 16G GPU Memory, so I want to know that how much GPU Memory I have at least to run it

Hi Hui, appreciate for your opening source and efforts.
I was wondering that how do you visualize the attention weights as reported in your paper. Because the initial data from SCAN has no region label information (x, y, w, h).

Question about mask

Hi Hui, thanks for your open-sourcing and appreciate your efforts on it!
There is a mask in the data.py and model.py. What role does the mask play here? Thank you very much.

the npy file

Your work is rather good!but I have some question about 'Image features for training set, validation set and testing set should be merged in order into one .npy file',how to merged in order into one file?

您好，请问 KWAI-AD数据集开放吗？

Datasets can be found.

Hello, I find some problem in dataset. "https://scanproject.blob.core.windows.net/scan-data/data_no_feature.zip", the link is invalid, can you provide it? Thanks a lot.

problem in data files

Hi,
The website https://scanproject.blob.core.windows.net you shared in the data preparation section cannot be reached. I found another link https://iudata.blob.core.windows.net/scan/data_no_feature.zip for this data from SCAN project GitHub, but It doesn't have the numpy file of features.

Could you share an updated link that contains the correct data?

Some training details

Hi Hui, thanks for your open-sourcing and appreciate your efforts on it!

So I've tried to reproduce the best results you got on Flickr 30k while yet the best results I got based on your implementation is:
rsum: 477.0
Average i2t Recall: 86.0
Image to text: 70.5 91.2 96.3 1.0 2.7
Average t2i Recall: 73.0
Text to image: 53.1 79.6 86.3 1.0 9.2

while has a slightly different on the I2t benchmarks. I guess this might be caused by some training hyper-parameters difference so I'd like to check with you on that.

RN I'm training it on a 4-GPU machine with Batch-size to be 64, lr=2e-4 for 30 epochs, is there any specific parameters for Flickr 30k training to get the best 74.1 I2T task? Appreciate your help very much.

Yours,
Zhiyuan

huichen24 / imram Goto Github PK

imram's People

Contributors

Stargazers

Watchers

Forkers

imram's Issues

l1norm_d report errors

How much GPU Memory is required to run IMRAM (step=2, 3)?

Attention visualization

Question about mask

the npy file

您好，请问 KWAI-AD数据集开放吗？

Datasets can be found.

problem in data files

Some training details

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs