GithubHelp home page GithubHelp logo

batra-mlp-lab / visdial Goto Github PK

View Code? Open in Web Editor NEW
227.0 227.0 69.0 196 KB

[CVPR 2017] Torch code for Visual Dialog

Home Page: https://arxiv.org/abs/1611.08669

License: Other

Python 9.81% Lua 87.32% Shell 0.98% HTML 0.54% JavaScript 1.34%
computer-vision deep-learning natural-language-processing torch

visdial's People

Contributors

abhshkdz avatar ayshrv avatar satwikkottur avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

visdial's Issues

Inconsistent tensor size, training with mn-att-ques-im-hist

Hello,
I am trying to execute train.lua with mn-att-ques-im-hist encoder, I downloaded the data_img_pool5.h file on the google drive link you provided on issue #12 link: https://drive.google.com/open?id=0B-iGspODhEtrUXg5dXV5TlRJUmM

I execute the model on CPU with:
th train.lua -encoder mn-att-ques-im-hist -decoder gen -gpuid -1 -rnnHiddenSize 380 -numEpochs 40 -numLayers 1

and I get the below error:
/home/ubuntu/torch2/install/bin/luajit: /home/ubuntu/torch2/install/share/lua/5.1/nn/CAddTable.lua:16: inconsistent tensor size, expected r_ [400 x 512], t [400 x 512] and src [400 x 380] to have the same number of elements, but got 204800, 204800 and 152000 elements respectively at /home/ubuntu/torch2/pkg/torch/lib/TH/generic/THTensorMath.c:887 stack traceback: [C]: in function 'add' /home/ubuntu/torch2/install/share/lua/5.1/nn/CAddTable.lua:16: in function 'func' .../ubuntu/torch2/install/share/lua/5.1/nngraph/gmodule.lua:345: in function 'neteval' .../ubuntu/torch2/install/share/lua/5.1/nngraph/gmodule.lua:380: in function 'forward' ./model.lua:229: in function 'forwardBackward' ./model.lua:74: in function 'trainIteration' train.lua:72: in main chunk [C]: in function 'dofile' ...ntu/torch2/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00405d50

Please can you tell how to solve this?
Regards,
Enid

bad argument #2 to 'add' (sizes do not match at /torch-addons/cutorch/lib/THC/generated/../generic/THCTensorMathPointwise.cu:269)

local img_tr = nn.Dropout(0.5)(

This is the error which I am getting on running train.lua with mn-att-ques-im-hist encoder and gen decoder. And this error corresponds to the encoder forward pass in the forwardBackward function.
I am able to get it running by changing the params.imgFeatureSize (whose value is 4096) in the above-mentioned line to 512.
@abhshkdz

Invalid permutation error reading image feature you gave

Hi, I just want to run the pretrained model but error occurs reading image features
(Invalid permutation error).

I downloaded preprocessed data below.

preprocessed data (data/)
visdial_data_trainval.h5, visdial_params_trainval.json, data_img_vgg16_relu7_trainval.h5

pretrained model (checkpoints/)
lf-att-ques-im-hist-disc-vgg16-24.t7

I run the command below (test set)
th evaluate.lua -loadPath checkpoints/lf-att-ques-im-hist-disc-vgg16-24.t7 -gpuid 0 -split test

stack trace
{
useGt : false
inputQues : "data/visdial_data_trainval.h5"
batchSize : 30
split : "test"
loadPath : "checkpoints/lf-att-ques-im-hist-disc-vgg16-24.t7"
inputJson : "data/visdial_params_trainval.json"
saveRanks : true
saveRankPath : "models/test.json"
backend : "cudnn"
gpuid : 1
inputImg : "data/data_img_vgg16_relu7_trainval.h5"
}
DataLoader loading json file: data/visdial_params.json
Vocabulary size (with ,): 11403

DataLoader loading h5 file: data/visdial_data.h5
DataLoader loading h5 file: data/data_img.h5
Reading image features..
/root/torch/install/bin/luajit: /root/torch/install/share/lua/5.1/torch/Tensor.lua:543: Invalid permutation
stack traceback:

[C]: in function 'assert'
/root/torch/install/share/lua/5.1/torch/Tensor.lua:543: in function 'permute'
dataloader.lua:71: in function 'initialize'
evaluate.lua:81: in main chunk
[C]: in function 'dofile'
/root/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00406670

About sampling

Hi,

In your code here model.lua, shouldn't the decoder first couple with the encoder, so the sampling could also conditioned on the image features and the previous conversions?

It seems to me that, the sampling is only based on the word in the previous time-step, without the knowledge of the image features and so on.

Best,
Rui

Models for Android Use?

Hi all,

I was hoping to find out more information regarding using your model in my current standalone android app. I'm interested in keeping the same domain initially as I'm only attempting to incorporate this model within Android. Would I just need the data.h5, params.json, and img.h5 files or would I be able to skip that step since my domain is the same?

Thanks.

THCudaCheck FAIL error=2 : out of memory... MultiGPU training

Hi,

I try to train the model using command "th train.lua -encoder hre-ques-hist -decoder gen -gpuid 1" . But somehow it shows THCudaCheck FAIL ...error=2 : out of memory. I was able to perform training after I reduced batch size less than 40.

I would like to know how can I utilize two GPU for this training? Kindly advise.

Issue with prepro.py at line 150.

Why is the loop over 'j' at line no. 150 in prepro.py script necessary, when at test time, we only need to look at the options in the last round.

Error with command "python prepro.py -download 1 -image_root /path/to/coco/images

Hi, I have an issue at prepro.py file.
I did "python prepro.py -download 1 -image_root /path/to/coco/images"
It returns

 python prepro.py -download 1 -image_root /path/to/coco/images
/home/ai8503/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
usage: prepro.py [-h] [-download] [-train_split {train,trainval}]
                 [-input_json_train INPUT_JSON_TRAIN]
                 [-input_json_val INPUT_JSON_VAL]
                 [-input_json_test INPUT_JSON_TEST] [-image_root IMAGE_ROOT]
                 [-input_vocab INPUT_VOCAB] [-output_json OUTPUT_JSON]
                 [-output_h5 OUTPUT_H5] [-max_ques_len MAX_QUES_LEN]
                 [-max_ans_len MAX_ANS_LEN] [-max_cap_len MAX_CAP_LEN]
                 [-word_count_threshold WORD_COUNT_THRESHOLD]
prepro.py: error: unrecognized arguments: 1

So, I changed this command to "python prepro.py -downalod -image_root /path/to/coco/images", and it worked well. But I have an issue at line 286.

Saving hdf5...
[train2014] Preparing image paths with image_ids...
  0%|                                                                 | 0/82783 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "prepro.py", line 286, in <module>
    out['unique_img_train'] = get_image_ids(data_train, args, 'train')
  File "prepro.py", line 188, in get_image_ids
    image_ids[i] = id2path[image_id]
KeyError: 378466

I think that the json file seems to be the problem.
How can I solve this problem?

Error in forwardbackward() function while training the data

After setting up dependencies, on execution of th train.lua -encoder lf-ques -decoder gen -gpuid 0 error as shown in the snapshot below, is raised. I tried on the different PC configurations and the error still persists hence it is not a system specific problem.
image

feature extraction error

{
imgSize : 224
layerName : "relu7"
cnnModel : "models/vgg16/VGG_ILSVRC_16_layers.caffemodel"
batchSize : 50
outName : "data_img.h5"
inputJson : "visdial_params.json"
gpuid : 3
cnnProto : "models/vgg16/VGG_ILSVRC_16_layers_deploy.prototxt"
backend : "nn"
imageRoot : "/home/tommy/caffe-recurrent/data/coco/tools/images"
}
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:505] Reading dangerously large protocol message. If the message turns out to be larger than 1073741824 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 553432081
Successfully loaded models/vgg16/VGG_ILSVRC_16_layers.caffemodel
conv1_1: 64 3 3 3
conv1_2: 64 64 3 3
conv2_1: 128 64 3 3
conv2_2: 128 128 3 3
conv3_1: 256 128 3 3
conv3_2: 256 256 3 3
conv3_3: 256 256 3 3
conv4_1: 512 256 3 3
conv4_2: 512 512 3 3
conv4_3: 512 512 3 3
conv5_1: 512 512 3 3
conv5_2: 512 512 3 3
conv5_3: 512 512 3 3
fc6: 1 1 25088 4096
fc7: 1 1 4096 4096
fc8: 1 1 4096 1000
Processing 82783 images...
/home/tommy/torch/install/bin/lua: ...e/tommy/torch/install/share/lua/5.1/nn/Container.lua:67:
In 1 module of nn.Sequential:
/home/tommy/torch/install/share/lua/5.1/nn/THNN.lua:110: bad argument #4 to 'v' (weight tensor must be 2D (nOutputPlane,nInputPlanekHkW) at /tmp/luarocks_cunn-1.0-0-5194/cunn/lib/THCUNN/SpatialConvolutionMM.cu:13)
stack traceback:
[C]: in function 'v'
/home/tommy/torch/install/share/lua/5.1/nn/THNN.lua:110: in function 'SpatialConvolutionMM_updateOutput'
...orch/install/share/lua/5.1/nn/SpatialConvolution.lua:79: in function <...orch/install/share/lua/5.1/nn/SpatialConvolution.lua:76>
(tail call): ?
[C]: in function 'xpcall'
...e/tommy/torch/install/share/lua/5.1/nn/Container.lua:58: in function 'rethrowErrors'
.../tommy/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function <.../tommy/torch/install/share/lua/5.1/nn/Sequential.lua:41>
(tail call): ?
prepro_img.lua:94: in main chunk
[C]: in function 'dofile'
.../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: ?

WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above.
stack traceback:
[C]: in function 'error'
...e/tommy/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
.../tommy/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function <.../tommy/torch/install/share/lua/5.1/nn/Sequential.lua:41>
(tail call): ?
prepro_img.lua:94: in main chunk
[C]: in function 'dofile'
.../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: ?

prepro.py: TypeError: slice indices must be integers or None or have an __index__ method

Hi VisDial team,

Thank you for sharing the great work!
After I ran the command "python prepro.py -download 1", I have the following ouput:

Reading json...
train2014
Tokenizing captions...
Tokenizing questions...
Tokenizing answers...
val2014
Tokenizing captions...
Tokenizing questions...
Tokenizing answers...
Building vocabulary...
Words: 8845
Encoding based on vocabulary...
Creating data matrices...
Traceback (most recent call last):
File "prepro.py", line 161, in
captions_train, captions_train_len, questions_train, questions_train_len, answers_train, answers_train_len, options_train, options_train_list, options_train_len, answers_train_index, images_train_index, images_train_list = create_data_mats(data_train_toks, ques_train_inds, ans_train_inds, args)
File "prepro.py", line 94, in create_data_mats
captions[i][0:caption_len[i]] = data_toks[image_id]['caption_inds'][0:max_cap_len]
TypeError: slice indices must be integers or None or have an index method

Do you know how to fix this?

Thank you!

Best,
Rui

Why is the dimension of input image features in attention case 14x14x4096?

imgFeats = imgFeats:view(-1, self.params.imgSpatialSize, self.params.imgSpatialSize, self.params.imgFeatureSize)

Shouldn't the dimensions be 14x14x512 in case of using the image features for the attention-based encoders?
I am facing this issue (in the encoder forward pass of forwardBackward function) while running the mn-att-ques-im-hist encoder and gen decoder.
Exact problem is also mentioned in #24

dimension doesn't match during evalution

###function Model:predict in model.lua line215
ranks[{{startId, nextStartId - 1}, {}}] = self:retrieveBatch(batch) :view(nextStartId - startId, -1, self.params.numOptions);
the size of self:retrieveBatch(batch) is 300X1 while it is going to be reshaped to 10X300
So, there is a error that The number of covered elements is not a multiple of all elements.

Beam search with length normalized log likelihood?

Hi guys,

About beam search here, if I understand it correctly, this is the normal beam search without length normalized log likelihood, then it should tend to find a shorter sequence, right? Did you also try the beam search with the length normalized log likelihood?

Thank you!

Best,
Rui

Evaluation of pretrained late fusion model fails

th evaluate.lua -loadPath checkpoints/lf-qih-d.t7 -gpuid 0

Using the pre-trained models and the preprocessed data available for download results in the following error:

Setting up model..==== 104200/104242 =========>.]  ETA: 0ms | Step: 0ms         
Encoder:        lf-ques-im-hist
Decoder:        gen
Evaluating..
numThreads      40504
/home/user/torch/install/bin/lua: ...me/user/torch/install/share/lua/5.1/nn/Container.lua:67:
In 1 module of nn.Sequential:
In 1 module of nn.ConcatTable:
In 3 module of nn.Sequential:
...ome/user/torch/install/share/lua/5.1/rnn/SeqLSTM.lua:99: nn.SeqLSTM expecting previous call to setZeroMask(zeroMask) with maskzero=true

Issue on extracting image feature

Hi i'm getting problem on extracting image feature.
I ran this command, "th prepro_img_vgg16.lua -imageRoot ~/Desktop/2014/ -gpuid 0"
It returns

/home/ai8503/torch/install/bin/lua: ...me/ai8503/torch/install/share/lua/5.1/trepl/init.lua:389: ...me/ai8503/torch/install/share/lua/5.1/trepl/init.lua:389: ...home/ai8503/torch/install/share/lua/5.1/hdf5/ffi.lua:56: expected align(#) on line 579
stack traceback:
	[C]: in function 'error'
	...me/ai8503/torch/install/share/lua/5.1/trepl/init.lua:389: in function 'require'
	prepro_img_vgg16.lua:3: in main chunk
	[C]: in function 'dofile'
	.../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
	[C]: ?

How can I fix this?

Pytorch starter code

Hi,

Do you guyz plan to release starter code in pytorch for the challenge? visdial-rl does provide some insights but is tailored more for Visual Dialog Agents as described in the paper "Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.