jnhwkim / mullowbivqa Goto Github PK

Hadamard Product for Low-rank Bilinear Pooling

License: Other

Lua 93.05% JavaScript 6.95%

mlb hadamard-product vqa question-answering iclr2017

mullowbivqa's Introduction

Hadamard Product for Low-rank Bilinear Pooling

Multimodal Low-rank Bilinear Attention Networks (MLB) have an efficient attention mechanism by low-rank bilinear pooling for visual question-answering tasks. MLB achieves a new state-of-the-art performance, having a better parsimonious property than previous methods.

This current code can get 65.07 on Open-Ended and 68.89 on Multiple-Choice on test-standard split for the VQA dataset. For an ensemble model, 66.89 and 70.29, resepectively.

Dependencies

You can install the dependencies:

luarocks install rnn

Training

Please follow the instruction from VQA_LSTM_CNN for preprocessing. --split 2 option allows to use train+val set to train, and test-dev or test-standard set to evaluate. Set --num_ans to 2000 to reproduce the result.

For question features, you need to use this:

skip-thoughts
DPPnet (see 003_skipthoughts_porting)
make_lookuptable.lua

for image features,

$ th prepro_res.lua -input_json data_train-val_test-dev_2k/data_prepro.json -image_root path_to_image_root -cnn_model path to cnn_model

The pretrained ResNet-152 model and related scripts can be found in fb.resnet.torch.

$ th train.lua

With the default parameter, this will take around 2.6 days on a sinlge NVIDIA Titan X GPU, and will generate the model under model/. For the result of the paper, use -seconds option for answer sampling in Section 5. seconds.json file can be optained using prepro_seconds.lua or from here (updated as default).

Evaluation

$ th eval.lua

References

If you use this code as part of any published research, we'd really appreciate it if you could cite the following paper:

@inproceedings{Kim2017,
author = {Kim, Jin-Hwa and On, Kyoung Woon and Lim, Woosang and Kim, Jeonghee and Ha, Jung-Woo and Zhang, Byoung-Tak},
booktitle = {The 5th International Conference on Learning Representations},
title = {{Hadamard Product for Low-rank Bilinear Pooling}},
year = {2017}
}

This code uses Torch7 rnn package and its TrimZero module for question embeddings. Notice that following papers:

@article{Leonard2015a,
author = {L{\'{e}}onard, Nicholas and Waghmare, Sagar and Wang, Yang and Kim, Jin-Hwa},
journal = {arXiv preprint arXiv:1511.07889},
title = {{rnn : Recurrent Library for Torch}},
year = {2015}
}
@inproceedings{Kim2016a,
author = {Kim, Jin-Hwa and Kim, Jeonghee and Ha, Jung-Woo and Zhang, Byoung-Tak},
booktitle = {Proceedings of KIIS Spring Conference},
isbn = {2093-4025},
number = {1},
pages = {165--166},
title = {{TrimZero: A Torch Recurrent Module for Efficient Natural Language Processing}},
volume = {26},
year = {2016}
}

License

BSD 3-Clause License

Patent (Pending)

METHOD AND SYSTEM FOR PROCESSING DATA USING ELEMENT-WISE MULTIPLICATION AND MULTIMODAL RESIDUAL LEARNING FOR VISUAL QUESTION-ANSWERING

mullowbivqa's People

Contributors

Stargazers

Watchers

Forkers

benjamesbabala allensmile suhmily manlegend hyeonwoonoh chenfei-wu peratham eustcpl yourtone dimplesl swstarlab researcher2003pro yyf17 shubhampachori12110095 afcarl hang-lu nobelvictory

mullowbivqa's Issues

image root

To replicate the results, have you taken both the training and validation images of MS-COCO as image root or only the training images?

Is the extracted visual features file really large?

Hi, jnhwkim. I extracted visual features using prepro_res.lua, but the h5 file is really large, about hundreds of GB. What is the size of your extracted h5 file? Thanks in advance for your help.

Skip thoughts question embeddings

Hi, can you please give more information about the pretrained corpus that you used. The DPP paper only mentions Huge corpus, but he is not specific.
Moreover, do you have results without the pre-trained model so we can have comparable score?

Thanks in advance

what‘s the special function of mhdf5.lua

@jnhwkim Hi, I am new to Lua. I find that you define mhdf5.lua to read the image feature. I am a little confused about the function of mhdf5.lua: why not get the image feature directly by using : h5_file:read()?

Have you tried VGG features?

Hi, nice job!
Compare with Resnet , how many percentage points it will be lower if I use VGG features?
Have you tried VGG features?

`optim` package issue related to reproducibility

The initial parameter of RMSProp algorithm in torch optim package was changed during Jun 8, 2016 to Sep 6, 2016. If you use the commit of optim package during this period, you may have severe performance degradation due to ill-conditioned hyper-parameters.

Please check the version of optim package (in ~/torch/install/share/lua/5.1/optim/rmsprop.lua) and update to the latest one.

You can find my struggle for this issue in this thread.

Generating gru.t7

Hello,

I can't figure out how to generate the gru.t7 torch file.
003_skipthoughts_porting generates vqa_uni_gru_word2vec.t7
However this seems to be only a torch tensor without the rnn's GRU object.

Can you please let me know how you generate the gru.t7 file?
I am generating skipthoughts for trainval.

My steps:

Downloaded skipt thought tables
Run make_lookuptable.lua to generate vocab_2k.txt
Copy vocab_2k.txt to skipthoughts_porting/
Edit 002_writevocab_table_vqa.py line 57 to use vocab_2k.txt
Run 002_writevocab_table_vqa.py
Run th 004_save_params_in_torch_file_vqa.lua
Rerun make lookuptable.lua to generate lookup_2k.t7
Copy and rename vqa_uni_gru_word2vec.t7 to gru.t7
move lookup_2k.t7 and gru.t7 to skipthoughts_model
th train.lua

Thanks in advance! :)

Have you ever used other opimizer

Have you ever used other opimizer such as Adam?

Questions about pretrained model

Hi, when I tried to evaluate the pretrained model with default parameters, I got the following error:
bad argument #1 to 'copy' (sizes do not match at /tmp/luarocks_cutorch-scm-1-245/cutorch/lib/THC/THCTensorCopy.cu:31).
And I found that w:size() = 50390702, while the pretrained model's size is 51894822. Could you please help me with this problem? Thanks!

optimizer with vg

May I ask if the optimizing parameters are the same when training with or without visual genome? learning_rate, decay factors, etc.
Thanks!

How to use multi-gpus? The training is too slow!

Getting accuracy value

Hi, I am new to the VQA dataset. After generating the json file by running eval.lua, can you suggest as to how do I get the test accuracy values for the multiple choice and open ended questions.

A few questions

I’m sorry to bother you again..... I have a few questions
1.
I notice that you use GRU to process questions in your previous work(Kim et al., 2016b) and this paper.
you have implemented the code of LSTM in this work( elseif opt.rnn_model == 'LSTM' then....)
So, why do you choose GRU finally? how many percentage points it will be lower if I use LSTM?

2 . in train.lua
require 'netdef.MRN' ,but there is no MRN.lua file

3 should I use L2-normalization　in　prepro_res.lua ？
thanks!

Unable to reproduce performance

Hi, I had downloaded the data_prepro.h5, data_prepro.json and seconds.json from the google drive link that you have shared. Also I had generated the data_res.h5 file by running prepro_res.lua. However on re-training the model by running train.lua(with the default parameters)and submitting the json file to the challenge server, I am getting an accuracy of only 50%. The json file generated from the pretrained model is achieving the desired accuracy of 65%.

Did you train the model with some different set of hyperparameters or am I making some mistake in training?

decay_factor is wrong

decay_factor should be 0.99999040594147(not 0.99997592083) if opt.iterPerEpoch = 240000 / opt.batch_size ,and opt.batch_size = 100.
in the paper, batch_size is 200
In fact, opt.iterPerEpoch should be 334554/ opt.batch_size ,so the kick_interval must be changed too

jnhwkim / mullowbivqa Goto Github PK

mullowbivqa's Introduction

Hadamard Product for Low-rank Bilinear Pooling

Dependencies

Training

Evaluation

References

License

Patent (Pending)

mullowbivqa's People

Contributors

Stargazers

Watchers

Forkers

mullowbivqa's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs