GithubHelp home page GithubHelp logo

jnhwkim / mullowbivqa Goto Github PK

View Code? Open in Web Editor NEW
69.0 4.0 18.0 52 KB

Hadamard Product for Low-rank Bilinear Pooling

License: Other

Lua 93.05% JavaScript 6.95%
mlb hadamard-product vqa question-answering iclr2017

mullowbivqa's Introduction

Hadamard Product for Low-rank Bilinear Pooling

Multimodal Low-rank Bilinear Attention Networks (MLB) have an efficient attention mechanism by low-rank bilinear pooling for visual question-answering tasks. MLB achieves a new state-of-the-art performance, having a better parsimonious property than previous methods.

This current code can get 65.07 on Open-Ended and 68.89 on Multiple-Choice on test-standard split for the VQA dataset. For an ensemble model, 66.89 and 70.29, resepectively.

Dependencies

You can install the dependencies:

luarocks install rnn

Training

Please follow the instruction from VQA_LSTM_CNN for preprocessing. --split 2 option allows to use train+val set to train, and test-dev or test-standard set to evaluate. Set --num_ans to 2000 to reproduce the result.

For question features, you need to use this:

for image features,

$ th prepro_res.lua -input_json data_train-val_test-dev_2k/data_prepro.json -image_root path_to_image_root -cnn_model path to cnn_model

The pretrained ResNet-152 model and related scripts can be found in fb.resnet.torch.

$ th train.lua

With the default parameter, this will take around 2.6 days on a sinlge NVIDIA Titan X GPU, and will generate the model under model/. For the result of the paper, use -seconds option for answer sampling in Section 5. seconds.json file can be optained using prepro_seconds.lua or from here (updated as default).

Evaluation

$ th eval.lua

References

If you use this code as part of any published research, we'd really appreciate it if you could cite the following paper:

@inproceedings{Kim2017,
author = {Kim, Jin-Hwa and On, Kyoung Woon and Lim, Woosang and Kim, Jeonghee and Ha, Jung-Woo and Zhang, Byoung-Tak},
booktitle = {The 5th International Conference on Learning Representations},
title = {{Hadamard Product for Low-rank Bilinear Pooling}},
year = {2017}
}

This code uses Torch7 rnn package and its TrimZero module for question embeddings. Notice that following papers:

@article{Leonard2015a,
author = {L{\'{e}}onard, Nicholas and Waghmare, Sagar and Wang, Yang and Kim, Jin-Hwa},
journal = {arXiv preprint arXiv:1511.07889},
title = {{rnn : Recurrent Library for Torch}},
year = {2015}
}
@inproceedings{Kim2016a,
author = {Kim, Jin-Hwa and Kim, Jeonghee and Ha, Jung-Woo and Zhang, Byoung-Tak},
booktitle = {Proceedings of KIIS Spring Conference},
isbn = {2093-4025},
number = {1},
pages = {165--166},
title = {{TrimZero: A Torch Recurrent Module for Efficient Natural Language Processing}},
volume = {26},
year = {2016}
}

License

BSD 3-Clause License

Patent (Pending)

METHOD AND SYSTEM FOR PROCESSING DATA USING ELEMENT-WISE MULTIPLICATION AND MULTIMODAL RESIDUAL LEARNING FOR VISUAL QUESTION-ANSWERING

mullowbivqa's People

Contributors

jnhwkim avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

mullowbivqa's Issues

image root

To replicate the results, have you taken both the training and validation images of MS-COCO as image root or only the training images?

Skip thoughts question embeddings

Hi, can you please give more information about the pretrained corpus that you used. The DPP paper only mentions Huge corpus, but he is not specific.
Moreover, do you have results without the pre-trained model so we can have comparable score?

Thanks in advance

what‘s the special function of mhdf5.lua

@jnhwkim Hi, I am new to Lua. I find that you define mhdf5.lua to read the image feature. I am a little confused about the function of mhdf5.lua: why not get the image feature directly by using : h5_file:read()?

Have you tried VGG features?

Hi, nice job!
Compare with Resnet , how many percentage points it will be lower if I use VGG features?
Have you tried VGG features?

`optim` package issue related to reproducibility

The initial parameter of RMSProp algorithm in torch optim package was changed during Jun 8, 2016 to Sep 6, 2016. If you use the commit of optim package during this period, you may have severe performance degradation due to ill-conditioned hyper-parameters.

Please check the version of optim package (in ~/torch/install/share/lua/5.1/optim/rmsprop.lua) and update to the latest one.

You can find my struggle for this issue in this thread.

Generating gru.t7

Hello,

I can't figure out how to generate the gru.t7 torch file.
003_skipthoughts_porting generates vqa_uni_gru_word2vec.t7
However this seems to be only a torch tensor without the rnn's GRU object.

Can you please let me know how you generate the gru.t7 file?
I am generating skipthoughts for trainval.

My steps:

  1. Downloaded skipt thought tables
  2. Run make_lookuptable.lua to generate vocab_2k.txt
  3. Copy vocab_2k.txt to skipthoughts_porting/
  4. Edit 002_writevocab_table_vqa.py line 57 to use vocab_2k.txt
  5. Run 002_writevocab_table_vqa.py
  6. Run th 004_save_params_in_torch_file_vqa.lua
  7. Rerun make lookuptable.lua to generate lookup_2k.t7
  8. Copy and rename vqa_uni_gru_word2vec.t7 to gru.t7
  9. move lookup_2k.t7 and gru.t7 to skipthoughts_model
  10. th train.lua

Thanks in advance! :)

Questions about pretrained model

Hi, when I tried to evaluate the pretrained model with default parameters, I got the following error:
bad argument #1 to 'copy' (sizes do not match at /tmp/luarocks_cutorch-scm-1-245/cutorch/lib/THC/THCTensorCopy.cu:31).
And I found that w:size() = 50390702, while the pretrained model's size is 51894822. Could you please help me with this problem? Thanks!

optimizer with vg

May I ask if the optimizing parameters are the same when training with or without visual genome? learning_rate, decay factors, etc.
Thanks!

Getting accuracy value

Hi, I am new to the VQA dataset. After generating the json file by running eval.lua, can you suggest as to how do I get the test accuracy values for the multiple choice and open ended questions.

A few questions

I’m sorry to bother you again..... I have a few questions
1.
I notice that you use GRU to process questions in your previous work(Kim et al., 2016b) and this paper.
you have implemented the code of LSTM in this work( elseif opt.rnn_model == 'LSTM' then....)
So, why do you choose GRU finally? how many percentage points it will be lower if I use LSTM?

2 . in train.lua
require 'netdef.MRN' ,but there is no MRN.lua file

3 should I use L2-normalization in prepro_res.lua ?
thanks!

Unable to reproduce performance

Hi, I had downloaded the data_prepro.h5, data_prepro.json and seconds.json from the google drive link that you have shared. Also I had generated the data_res.h5 file by running prepro_res.lua. However on re-training the model by running train.lua(with the default parameters)and submitting the json file to the challenge server, I am getting an accuracy of only 50%. The json file generated from the pretrained model is achieving the desired accuracy of 65%.

Did you train the model with some different set of hyperparameters or am I making some mistake in training?

decay_factor is wrong

decay_factor should be 0.99999040594147(not 0.99997592083) if opt.iterPerEpoch = 240000 / opt.batch_size ,and opt.batch_size = 100.
in the paper, batch_size is 200
In fact, opt.iterPerEpoch should be 334554/ opt.batch_size ,so the kick_interval must be changed too

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.