batra-mlp-lab / visdial Goto Github PK

View Code? Open in Web Editor NEW

227.0 227.0 69.0 196 KB

[CVPR 2017] Torch code for Visual Dialog

Home Page: https://arxiv.org/abs/1611.08669

License: Other

Python 9.81% Lua 87.32% Shell 0.98% HTML 0.54% JavaScript 1.34%

computer-vision deep-learning natural-language-processing torch

visdial's People

Contributors

Stargazers

Watchers

visdial's Issues

Update pretrained checkpoints

why 80,000 visual feature? not 82,783

visual feature extraction code processing 80,000 images..
actual mscoco number of image is 82,783..
80,000 images is an unique number of image?

The chat interface of visual dialog on this page: https://visualdialog.org/ is not working.

It generates the caption very well but after asking the question it doesn'r replies back.

It is stuck at this place.

Inconsistent tensor size, training with mn-att-ques-im-hist

Hello,
I am trying to execute train.lua with mn-att-ques-im-hist encoder, I downloaded the data_img_pool5.h file on the google drive link you provided on issue #12 link: https://drive.google.com/open?id=0B-iGspODhEtrUXg5dXV5TlRJUmM

I execute the model on CPU with:
th train.lua -encoder mn-att-ques-im-hist -decoder gen -gpuid -1 -rnnHiddenSize 380 -numEpochs 40 -numLayers 1

and I get the below error:
/home/ubuntu/torch2/install/bin/luajit: /home/ubuntu/torch2/install/share/lua/5.1/nn/CAddTable.lua:16: inconsistent tensor size, expected r_ [400 x 512], t [400 x 512] and src [400 x 380] to have the same number of elements, but got 204800, 204800 and 152000 elements respectively at /home/ubuntu/torch2/pkg/torch/lib/TH/generic/THTensorMath.c:887 stack traceback: [C]: in function 'add' /home/ubuntu/torch2/install/share/lua/5.1/nn/CAddTable.lua:16: in function 'func' .../ubuntu/torch2/install/share/lua/5.1/nngraph/gmodule.lua:345: in function 'neteval' .../ubuntu/torch2/install/share/lua/5.1/nngraph/gmodule.lua:380: in function 'forward' ./model.lua:229: in function 'forwardBackward' ./model.lua:74: in function 'trainIteration' train.lua:72: in main chunk [C]: in function 'dofile' ...ntu/torch2/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00405d50

Please can you tell how to solve this?
Regards,
Enid

bad argument #2 to 'add' (sizes do not match at /torch-addons/cutorch/lib/THC/generated/../generic/THCTensorMathPointwise.cu:269)

visdial/encoders/mn-att-ques-im-hist.lua

Line 74 in 3ffb761

local img_tr = nn.Dropout(0.5)(

This is the error which I am getting on running train.lua with mn-att-ques-im-hist encoder and gen decoder. And this error corresponds to the encoder forward pass in the forwardBackward function.
I am able to get it running by changing the params.imgFeatureSize (whose value is 4096) in the above-mentioned line to 512.
@abhshkdz

Invalid permutation error reading image feature you gave

Hi, I just want to run the pretrained model but error occurs reading image features
(Invalid permutation error).

I downloaded preprocessed data below.

preprocessed data (data/)
visdial_data_trainval.h5, visdial_params_trainval.json, data_img_vgg16_relu7_trainval.h5

pretrained model (checkpoints/)
lf-att-ques-im-hist-disc-vgg16-24.t7

I run the command below (test set)
th evaluate.lua -loadPath checkpoints/lf-att-ques-im-hist-disc-vgg16-24.t7 -gpuid 0 -split test

stack trace
{
useGt : false
inputQues : "data/visdial_data_trainval.h5"
batchSize : 30
split : "test"
loadPath : "checkpoints/lf-att-ques-im-hist-disc-vgg16-24.t7"
inputJson : "data/visdial_params_trainval.json"
saveRanks : true
saveRankPath : "models/test.json"
backend : "cudnn"
gpuid : 1
inputImg : "data/data_img_vgg16_relu7_trainval.h5"
}
DataLoader loading json file: data/visdial_params.json
Vocabulary size (with ,): 11403

DataLoader loading h5 file: data/visdial_data.h5
DataLoader loading h5 file: data/data_img.h5
Reading image features..
/root/torch/install/bin/luajit: /root/torch/install/share/lua/5.1/torch/Tensor.lua:543: Invalid permutation
stack traceback:
[C]: in function 'assert'
/root/torch/install/share/lua/5.1/torch/Tensor.lua:543: in function 'permute'
dataloader.lua:71: in function 'initialize'
evaluate.lua:81: in main chunk
[C]: in function 'dofile'
/root/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00406670

About sampling

Hi,

In your code here model.lua, shouldn't the decoder first couple with the encoder, so the sampling could also conditioned on the image features and the previous conversions?

It seems to me that, the sampling is only based on the word in the previous time-step, without the knowledge of the image features and so on.

Best,
Rui

How did you generate Figure 5 in your paper?

Hi,
I would like to know what tool you used to generate Figure 5 in your paper.

Thank you!

Support for extracting conv layer / spatial features for ResNets (for attention-based models)

Is there any particular reason why you add +1

visdial/data/prepro.py

Line 163 in e7ffec3

answer_index[i][j] = dialog['dialog'][j]['gt_index'] + 1

Models for Android Use?

Hi all,

I was hoping to find out more information regarding using your model in my current standalone android app. I'm interested in keeping the same domain initially as I'm only attempting to incorporate this model within Android. Would I just need the data.h5, params.json, and img.h5 files or would I be able to skip that step since my domain is the same?

Thanks.

THCudaCheck FAIL error=2 : out of memory... MultiGPU training

Hi,

I try to train the model using command "th train.lua -encoder hre-ques-hist -decoder gen -gpuid 1" . But somehow it shows THCudaCheck FAIL ...error=2 : out of memory. I was able to perform training after I reduced batch size less than 40.

I would like to know how can I utilize two GPU for this training? Kindly advise.

Issue with prepro.py at line 150.

Why is the loop over 'j' at line no. 150 in prepro.py script necessary, when at test time, we only need to look at the options in the last round.

Wrong filename for visdial_data_{}.h5 in VisDial v1.0

@abhshkdz Hi,My code could not get the val question in visdial_data_trainval.h5 and I found the val set in the train.h5
Could you please check whether the filename is wrong in VisDial v1.0?

Error with command "python prepro.py -download 1 -image_root /path/to/coco/images

Hi, I have an issue at prepro.py file.
I did "python prepro.py -download 1 -image_root /path/to/coco/images"
It returns

 python prepro.py -download 1 -image_root /path/to/coco/images
/home/ai8503/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
usage: prepro.py [-h] [-download] [-train_split {train,trainval}]
                 [-input_json_train INPUT_JSON_TRAIN]
                 [-input_json_val INPUT_JSON_VAL]
                 [-input_json_test INPUT_JSON_TEST] [-image_root IMAGE_ROOT]
                 [-input_vocab INPUT_VOCAB] [-output_json OUTPUT_JSON]
                 [-output_h5 OUTPUT_H5] [-max_ques_len MAX_QUES_LEN]
                 [-max_ans_len MAX_ANS_LEN] [-max_cap_len MAX_CAP_LEN]
                 [-word_count_threshold WORD_COUNT_THRESHOLD]
prepro.py: error: unrecognized arguments: 1

So, I changed this command to "python prepro.py -downalod -image_root /path/to/coco/images", and it worked well. But I have an issue at line 286.

Saving hdf5...
[train2014] Preparing image paths with image_ids...
  0%|                                                                 | 0/82783 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "prepro.py", line 286, in <module>
    out['unique_img_train'] = get_image_ids(data_train, args, 'train')
  File "prepro.py", line 188, in get_image_ids
    image_ids[i] = id2path[image_id]
KeyError: 378466

I think that the json file seems to be the problem.
How can I solve this problem?

Error in forwardbackward() function while training the data

After setting up dependencies, on execution of th train.lua -encoder lf-ques -decoder gen -gpuid 0 error as shown in the snapshot below, is raised. I tried on the different PC configurations and the error still persists hence it is not a system specific problem.

Why use "CAddTable" instead of concatenating features in the memory network?

Hi Visdial Team,

I have a question regarding to combining the history and question features in the memory network encoder. Why do you use element add instead of simply concatenating like you did in the late fusion encoder? Is there any advantages in this?

Thank you in advance!

Best,
Rui

Do we need to use early stopping to prevent over-fitting?

Hi @abhshkdz ,

In your code train.lua, it seems the model is saved without first evaluated on the validation dataset. Isn't early stopping necessary here?

Thank you!
Best,
Rui

Generate the new data_img.h5 from prepro_img_pool5.lua

Hi. The new data_img.h5 file generated from prepro_img_pool5.lua seems to be missing. Please could you upload it? I have limited computational resources and cannot generate it myself.

Thank you in advance.

Lori

feature extraction error

{
imgSize : 224
layerName : "relu7"
cnnModel : "models/vgg16/VGG_ILSVRC_16_layers.caffemodel"
batchSize : 50
outName : "data_img.h5"
inputJson : "visdial_params.json"
gpuid : 3
cnnProto : "models/vgg16/VGG_ILSVRC_16_layers_deploy.prototxt"
backend : "nn"
imageRoot : "/home/tommy/caffe-recurrent/data/coco/tools/images"
}
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:505] Reading dangerously large protocol message. If the message turns out to be larger than 1073741824 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 553432081
Successfully loaded models/vgg16/VGG_ILSVRC_16_layers.caffemodel
conv1_1: 64 3 3 3
conv1_2: 64 64 3 3
conv2_1: 128 64 3 3
conv2_2: 128 128 3 3
conv3_1: 256 128 3 3
conv3_2: 256 256 3 3
conv3_3: 256 256 3 3
conv4_1: 512 256 3 3
conv4_2: 512 512 3 3
conv4_3: 512 512 3 3
conv5_1: 512 512 3 3
conv5_2: 512 512 3 3
conv5_3: 512 512 3 3
fc6: 1 1 25088 4096
fc7: 1 1 4096 4096
fc8: 1 1 4096 1000
Processing 82783 images...
/home/tommy/torch/install/bin/lua: ...e/tommy/torch/install/share/lua/5.1/nn/Container.lua:67:
In 1 module of nn.Sequential:
/home/tommy/torch/install/share/lua/5.1/nn/THNN.lua:110: bad argument #4 to 'v' (weight tensor must be 2D (nOutputPlane,nInputPlanekHkW) at /tmp/luarocks_cunn-1.0-0-5194/cunn/lib/THCUNN/SpatialConvolutionMM.cu:13)
stack traceback:
[C]: in function 'v'
/home/tommy/torch/install/share/lua/5.1/nn/THNN.lua:110: in function 'SpatialConvolutionMM_updateOutput'
...orch/install/share/lua/5.1/nn/SpatialConvolution.lua:79: in function <...orch/install/share/lua/5.1/nn/SpatialConvolution.lua:76>
(tail call): ?
[C]: in function 'xpcall'
...e/tommy/torch/install/share/lua/5.1/nn/Container.lua:58: in function 'rethrowErrors'
.../tommy/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function <.../tommy/torch/install/share/lua/5.1/nn/Sequential.lua:41>
(tail call): ?
prepro_img.lua:94: in main chunk
[C]: in function 'dofile'
.../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: ?

WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above.
stack traceback:
[C]: in function 'error'
...e/tommy/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
.../tommy/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function <.../tommy/torch/install/share/lua/5.1/nn/Sequential.lua:41>
(tail call): ?
prepro_img.lua:94: in main chunk
[C]: in function 'dofile'
.../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: ?

prepro.py: TypeError: slice indices must be integers or None or have an index method

Hi VisDial team,

Thank you for sharing the great work!
After I ran the command "python prepro.py -download 1", I have the following ouput:

Reading json...
train2014
Tokenizing captions...
Tokenizing questions...
Tokenizing answers...
val2014
Tokenizing captions...
Tokenizing questions...
Tokenizing answers...
Building vocabulary...
Words: 8845
Encoding based on vocabulary...
Creating data matrices...
Traceback (most recent call last):
File "prepro.py", line 161, in
captions_train, captions_train_len, questions_train, questions_train_len, answers_train, answers_train_len, options_train, options_train_list, options_train_len, answers_train_index, images_train_index, images_train_list = create_data_mats(data_train_toks, ques_train_inds, ans_train_inds, args)
File "prepro.py", line 94, in create_data_mats
captions[i][0:caption_len[i]] = data_toks[image_id]['caption_inds'][0:max_cap_len]
TypeError: slice indices must be integers or None or have an index method

Do you know how to fix this?

Thank you!

Best,
Rui

Where is the code concatenating the t-rounds of history for lf-encoder?

Hi Visdial Team,

Thank for the great code!
I am a little lost. Could you please point me where is the code for concatenating the t-rounds of the history question and answers for the late fusion encoder?

Thank you!

Best,
Rui

Why is the dimension of input image features in attention case 14x14x4096?

visdial/model.lua

Line 265 in 3ffb761

 imgFeats = imgFeats:view(-1, self.params.imgSpatialSize, self.params.imgSpatialSize, self.params.imgFeatureSize) 

Shouldn't the dimensions be 14x14x512 in case of using the image features for the attention-based encoders?
I am facing this issue (in the encoder forward pass of forwardBackward function) while running the mn-att-ques-im-hist encoder and gen decoder.
Exact problem is also mentioned in #24

dimension doesn't match during evalution

###function Model:predict in model.lua line215
ranks[{{startId, nextStartId - 1}, {}}] = self:retrieveBatch(batch) :view(nextStartId - startId, -1, self.params.numOptions);
the size of self:retrieveBatch(batch) is 300X1 while it is going to be reshaped to 10X300
So, there is a error that The number of covered elements is not a multiple of all elements.

maybe an error in dataloader.lua

https://github.com/batra-mlp-lab/visdial/blob/master/dataloader.lua#L104
Should self.maxAnsLen = self[dtype..'_ans']:size(2) be self.maxAnsLen = self[dtype..'_ans']:size(3)?

Beam search with length normalized log likelihood?

Hi guys,

About beam search here, if I understand it correctly, this is the normal beam search without length normalized log likelihood, then it should tend to find a shorter sequence, right? Did you also try the beam search with the length normalized log likelihood?

Thank you!

Best,
Rui

Evaluation of pretrained late fusion model fails

th evaluate.lua -loadPath checkpoints/lf-qih-d.t7 -gpuid 0

Using the pre-trained models and the preprocessed data available for download results in the following error:

Setting up model..==== 104200/104242 =========>.]  ETA: 0ms | Step: 0ms         
Encoder:        lf-ques-im-hist
Decoder:        gen
Evaluating..
numThreads      40504
/home/user/torch/install/bin/lua: ...me/user/torch/install/share/lua/5.1/nn/Container.lua:67:
In 1 module of nn.Sequential:
In 1 module of nn.ConcatTable:
In 3 module of nn.Sequential:
...ome/user/torch/install/share/lua/5.1/rnn/SeqLSTM.lua:99: nn.SeqLSTM expecting previous call to setZeroMask(zeroMask) with maskzero=true

Issue on extracting image feature

Hi i'm getting problem on extracting image feature.
I ran this command, "th prepro_img_vgg16.lua -imageRoot ~/Desktop/2014/ -gpuid 0"
It returns

/home/ai8503/torch/install/bin/lua: ...me/ai8503/torch/install/share/lua/5.1/trepl/init.lua:389: ...me/ai8503/torch/install/share/lua/5.1/trepl/init.lua:389: ...home/ai8503/torch/install/share/lua/5.1/hdf5/ffi.lua:56: expected align(#) on line 579
stack traceback:
	[C]: in function 'error'
	...me/ai8503/torch/install/share/lua/5.1/trepl/init.lua:389: in function 'require'
	prepro_img_vgg16.lua:3: in main chunk
	[C]: in function 'dofile'
	.../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
	[C]: ?

How can I fix this?

Pytorch starter code

Hi,

Do you guyz plan to release starter code in pytorch for the challenge? visdial-rl does provide some insights but is tailored more for Visual Dialog Agents as described in the paper "Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning"

batra-mlp-lab / visdial Goto Github PK

visdial's People

Contributors

Stargazers

Watchers

Forkers

visdial's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs