batra-mlp-lab / visdial Goto Github PK
View Code? Open in Web Editor NEW[CVPR 2017] Torch code for Visual Dialog
Home Page: https://arxiv.org/abs/1611.08669
License: Other
[CVPR 2017] Torch code for Visual Dialog
Home Page: https://arxiv.org/abs/1611.08669
License: Other
visual feature extraction code processing 80,000 images..
actual mscoco number of image is 82,783..
80,000 images is an unique number of image?
Hello,
I am trying to execute train.lua with mn-att-ques-im-hist encoder, I downloaded the data_img_pool5.h file on the google drive link you provided on issue #12 link: https://drive.google.com/open?id=0B-iGspODhEtrUXg5dXV5TlRJUmM
I execute the model on CPU with:
th train.lua -encoder mn-att-ques-im-hist -decoder gen -gpuid -1 -rnnHiddenSize 380 -numEpochs 40 -numLayers 1
and I get the below error:
/home/ubuntu/torch2/install/bin/luajit: /home/ubuntu/torch2/install/share/lua/5.1/nn/CAddTable.lua:16: inconsistent tensor size, expected r_ [400 x 512], t [400 x 512] and src [400 x 380] to have the same number of elements, but got 204800, 204800 and 152000 elements respectively at /home/ubuntu/torch2/pkg/torch/lib/TH/generic/THTensorMath.c:887 stack traceback: [C]: in function 'add' /home/ubuntu/torch2/install/share/lua/5.1/nn/CAddTable.lua:16: in function 'func' .../ubuntu/torch2/install/share/lua/5.1/nngraph/gmodule.lua:345: in function 'neteval' .../ubuntu/torch2/install/share/lua/5.1/nngraph/gmodule.lua:380: in function 'forward' ./model.lua:229: in function 'forwardBackward' ./model.lua:74: in function 'trainIteration' train.lua:72: in main chunk [C]: in function 'dofile' ...ntu/torch2/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00405d50
Please can you tell how to solve this?
Regards,
Enid
visdial/encoders/mn-att-ques-im-hist.lua
Line 74 in 3ffb761
This is the error which I am getting on running train.lua with mn-att-ques-im-hist encoder and gen decoder. And this error corresponds to the encoder forward pass in the forwardBackward function.
I am able to get it running by changing the params.imgFeatureSize (whose value is 4096) in the above-mentioned line to 512.
@abhshkdz
Hi, I just want to run the pretrained model but error occurs reading image features
(Invalid permutation error).
I downloaded preprocessed data below.
preprocessed data (data/)
visdial_data_trainval.h5
, visdial_params_trainval.json
, data_img_vgg16_relu7_trainval.h5
pretrained model (checkpoints/)
lf-att-ques-im-hist-disc-vgg16-24.t7
I run the command below (test set)
th evaluate.lua -loadPath checkpoints/lf-att-ques-im-hist-disc-vgg16-24.t7 -gpuid 0 -split test
stack trace
{
useGt : false
inputQues : "data/visdial_data_trainval.h5"
batchSize : 30
split : "test"
loadPath : "checkpoints/lf-att-ques-im-hist-disc-vgg16-24.t7"
inputJson : "data/visdial_params_trainval.json"
saveRanks : true
saveRankPath : "models/test.json"
backend : "cudnn"
gpuid : 1
inputImg : "data/data_img_vgg16_relu7_trainval.h5"
}
DataLoader loading json file: data/visdial_params.json
Vocabulary size (with ,): 11403
DataLoader loading h5 file: data/visdial_data.h5
DataLoader loading h5 file: data/data_img.h5
Reading image features..
/root/torch/install/bin/luajit: /root/torch/install/share/lua/5.1/torch/Tensor.lua:543: Invalid permutation
stack traceback:
[C]: in function 'assert'
/root/torch/install/share/lua/5.1/torch/Tensor.lua:543: in function 'permute'
dataloader.lua:71: in function 'initialize'
evaluate.lua:81: in main chunk
[C]: in function 'dofile'
/root/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00406670
Hi,
In your code here model.lua, shouldn't the decoder first couple with the encoder, so the sampling could also conditioned on the image features and the previous conversions?
It seems to me that, the sampling is only based on the word in the previous time-step, without the knowledge of the image features and so on.
Best,
Rui
Line 163 in e7ffec3
Hi all,
I was hoping to find out more information regarding using your model in my current standalone android app. I'm interested in keeping the same domain initially as I'm only attempting to incorporate this model within Android. Would I just need the data.h5, params.json, and img.h5 files or would I be able to skip that step since my domain is the same?
Thanks.
Hi,
I try to train the model using command "th train.lua -encoder hre-ques-hist -decoder gen -gpuid 1" . But somehow it shows THCudaCheck FAIL ...error=2 : out of memory. I was able to perform training after I reduced batch size less than 40.
I would like to know how can I utilize two GPU for this training? Kindly advise.
Why is the loop over 'j' at line no. 150 in prepro.py script necessary, when at test time, we only need to look at the options in the last round.
@abhshkdz Hi,My code could not get the val question in visdial_data_trainval.h5
and I found the val set in the train.h5
Could you please check whether the filename is wrong in VisDial v1.0?
Hi, I have an issue at prepro.py file.
I did "python prepro.py -download 1 -image_root /path/to/coco/images"
It returns
python prepro.py -download 1 -image_root /path/to/coco/images
/home/ai8503/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
usage: prepro.py [-h] [-download] [-train_split {train,trainval}]
[-input_json_train INPUT_JSON_TRAIN]
[-input_json_val INPUT_JSON_VAL]
[-input_json_test INPUT_JSON_TEST] [-image_root IMAGE_ROOT]
[-input_vocab INPUT_VOCAB] [-output_json OUTPUT_JSON]
[-output_h5 OUTPUT_H5] [-max_ques_len MAX_QUES_LEN]
[-max_ans_len MAX_ANS_LEN] [-max_cap_len MAX_CAP_LEN]
[-word_count_threshold WORD_COUNT_THRESHOLD]
prepro.py: error: unrecognized arguments: 1
So, I changed this command to "python prepro.py -downalod -image_root /path/to/coco/images", and it worked well. But I have an issue at line 286.
Saving hdf5...
[train2014] Preparing image paths with image_ids...
0%| | 0/82783 [00:00<?, ?it/s]
Traceback (most recent call last):
File "prepro.py", line 286, in <module>
out['unique_img_train'] = get_image_ids(data_train, args, 'train')
File "prepro.py", line 188, in get_image_ids
image_ids[i] = id2path[image_id]
KeyError: 378466
I think that the json file seems to be the problem.
How can I solve this problem?
Hi Visdial Team,
I have a question regarding to combining the history and question features in the memory network encoder. Why do you use element add instead of simply concatenating like you did in the late fusion encoder? Is there any advantages in this?
Thank you in advance!
Best,
Rui
Hi. The new data_img.h5 file generated from prepro_img_pool5.lua seems to be missing. Please could you upload it? I have limited computational resources and cannot generate it myself.
Thank you in advance.
Lori
{
imgSize : 224
layerName : "relu7"
cnnModel : "models/vgg16/VGG_ILSVRC_16_layers.caffemodel"
batchSize : 50
outName : "data_img.h5"
inputJson : "visdial_params.json"
gpuid : 3
cnnProto : "models/vgg16/VGG_ILSVRC_16_layers_deploy.prototxt"
backend : "nn"
imageRoot : "/home/tommy/caffe-recurrent/data/coco/tools/images"
}
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:505] Reading dangerously large protocol message. If the message turns out to be larger than 1073741824 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 553432081
Successfully loaded models/vgg16/VGG_ILSVRC_16_layers.caffemodel
conv1_1: 64 3 3 3
conv1_2: 64 64 3 3
conv2_1: 128 64 3 3
conv2_2: 128 128 3 3
conv3_1: 256 128 3 3
conv3_2: 256 256 3 3
conv3_3: 256 256 3 3
conv4_1: 512 256 3 3
conv4_2: 512 512 3 3
conv4_3: 512 512 3 3
conv5_1: 512 512 3 3
conv5_2: 512 512 3 3
conv5_3: 512 512 3 3
fc6: 1 1 25088 4096
fc7: 1 1 4096 4096
fc8: 1 1 4096 1000
Processing 82783 images...
/home/tommy/torch/install/bin/lua: ...e/tommy/torch/install/share/lua/5.1/nn/Container.lua:67:
In 1 module of nn.Sequential:
/home/tommy/torch/install/share/lua/5.1/nn/THNN.lua:110: bad argument #4 to 'v' (weight tensor must be 2D (nOutputPlane,nInputPlanekHkW) at /tmp/luarocks_cunn-1.0-0-5194/cunn/lib/THCUNN/SpatialConvolutionMM.cu:13)
stack traceback:
[C]: in function 'v'
/home/tommy/torch/install/share/lua/5.1/nn/THNN.lua:110: in function 'SpatialConvolutionMM_updateOutput'
...orch/install/share/lua/5.1/nn/SpatialConvolution.lua:79: in function <...orch/install/share/lua/5.1/nn/SpatialConvolution.lua:76>
(tail call): ?
[C]: in function 'xpcall'
...e/tommy/torch/install/share/lua/5.1/nn/Container.lua:58: in function 'rethrowErrors'
.../tommy/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function <.../tommy/torch/install/share/lua/5.1/nn/Sequential.lua:41>
(tail call): ?
prepro_img.lua:94: in main chunk
[C]: in function 'dofile'
.../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: ?
WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above.
stack traceback:
[C]: in function 'error'
...e/tommy/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
.../tommy/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function <.../tommy/torch/install/share/lua/5.1/nn/Sequential.lua:41>
(tail call): ?
prepro_img.lua:94: in main chunk
[C]: in function 'dofile'
.../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: ?
Hi VisDial team,
Thank you for sharing the great work!
After I ran the command "python prepro.py -download 1", I have the following ouput:
Reading json...
train2014
Tokenizing captions...
Tokenizing questions...
Tokenizing answers...
val2014
Tokenizing captions...
Tokenizing questions...
Tokenizing answers...
Building vocabulary...
Words: 8845
Encoding based on vocabulary...
Creating data matrices...
Traceback (most recent call last):
File "prepro.py", line 161, in
captions_train, captions_train_len, questions_train, questions_train_len, answers_train, answers_train_len, options_train, options_train_list, options_train_len, answers_train_index, images_train_index, images_train_list = create_data_mats(data_train_toks, ques_train_inds, ans_train_inds, args)
File "prepro.py", line 94, in create_data_mats
captions[i][0:caption_len[i]] = data_toks[image_id]['caption_inds'][0:max_cap_len]
TypeError: slice indices must be integers or None or have an index method
Do you know how to fix this?
Thank you!
Best,
Rui
Hi Visdial Team,
Thank for the great code!
I am a little lost. Could you please point me where is the code for concatenating the t-rounds of the history question and answers for the late fusion encoder?
Thank you!
Best,
Rui
Line 265 in 3ffb761
###function Model:predict in model.lua line215
ranks[{{startId, nextStartId - 1}, {}}] = self:retrieveBatch(batch) :view(nextStartId - startId, -1, self.params.numOptions);
the size of self:retrieveBatch(batch)
is 300X1 while it is going to be reshaped to 10X300
So, there is a error that The number of covered elements is not a multiple of all elements.
https://github.com/batra-mlp-lab/visdial/blob/master/dataloader.lua#L104
Should self.maxAnsLen = self[dtype..'_ans']:size(2) be self.maxAnsLen = self[dtype..'_ans']:size(3)?
Hi guys,
About beam search here, if I understand it correctly, this is the normal beam search without length normalized log likelihood, then it should tend to find a shorter sequence, right? Did you also try the beam search with the length normalized log likelihood?
Thank you!
Best,
Rui
th evaluate.lua -loadPath checkpoints/lf-qih-d.t7 -gpuid 0
Using the pre-trained models and the preprocessed data available for download results in the following error:
Setting up model..==== 104200/104242 =========>.] ETA: 0ms | Step: 0ms
Encoder: lf-ques-im-hist
Decoder: gen
Evaluating..
numThreads 40504
/home/user/torch/install/bin/lua: ...me/user/torch/install/share/lua/5.1/nn/Container.lua:67:
In 1 module of nn.Sequential:
In 1 module of nn.ConcatTable:
In 3 module of nn.Sequential:
...ome/user/torch/install/share/lua/5.1/rnn/SeqLSTM.lua:99: nn.SeqLSTM expecting previous call to setZeroMask(zeroMask) with maskzero=true
Hi i'm getting problem on extracting image feature.
I ran this command, "th prepro_img_vgg16.lua -imageRoot ~/Desktop/2014/ -gpuid 0"
It returns
/home/ai8503/torch/install/bin/lua: ...me/ai8503/torch/install/share/lua/5.1/trepl/init.lua:389: ...me/ai8503/torch/install/share/lua/5.1/trepl/init.lua:389: ...home/ai8503/torch/install/share/lua/5.1/hdf5/ffi.lua:56: expected align(#) on line 579
stack traceback:
[C]: in function 'error'
...me/ai8503/torch/install/share/lua/5.1/trepl/init.lua:389: in function 'require'
prepro_img_vgg16.lua:3: in main chunk
[C]: in function 'dofile'
.../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: ?
How can I fix this?
Hi,
Do you guyz plan to release starter code in pytorch for the challenge? visdial-rl does provide some insights but is tailored more for Visual Dialog Agents as described in the paper "Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning"
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.