yukezhu / visual7w-qa-models Goto Github PK

View Code? Open in Web Editor NEW

62.0 62.0 23.0 131 KB

Visual7W visual question answering models

Home Page: http://ai.stanford.edu/~yukez/visual7w/

License: MIT License

Python 19.51% Shell 0.70% Lua 79.79%

deep-learning recurrent-neural-networks

visual7w-qa-models's People

Contributors

Stargazers

Watchers

visual7w-qa-models's Issues

Segmentation fault in executing train_telling.lua on Jetson TX1

Hello @yukezhu
I am having an issue executing the train_telling.lua script and I get segmentation fault error immediately after the successfully loaded cnn_models.

The execution is done on Jetson TX1 on GPU and with 4GB RAM shared between cpu and gpu

$ th train_telling.lua -gpuid 0 -mc_evaluation -verbose -finetune_cnn_after -1

QADatasetLoader loading dataset file: visual7w-toolkit/datasets/visual7w-telling/dataset.json
image size is 28653
QADatasetLoader loading json file: data/qa_data.json
vocab size is 3007
QADatasetLoader loading h5 file: data/qa_data.h5
max question sequence length in data is 15
max answer sequence length in data is 5
assigned 5678 images to split val
assigned 8609 images to split test
assigned 14366 images to split train
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:505] Reading dangerously large protocol message. If the message turns out to be larger than 1073741824 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 553432081
Successfully loaded cnn_models/VGG_ILSVRC_16_layers.caffemodel
Segmentation fault.

Do you have any suggestion how to solve this?

Thanks
Enid

Training with visual genome dataset

Hello @yukezhu ,

I am running the train_telling.lua code after generating qa_data.h5 and qa_data.json with prepare_dataset.py from Visual Genome dataset (read all the images from VG_100K and VG_100K_2 folders). I am training in GPU mode with batch_size=1. I get the following error after 25th iteration. Same error in the CPU mode.

.
.
.
question 976648: where is this picture taken ? ten king doe while staying rc mason culture kitties familiar doe staying church lockers .	
evaluating validation performance... 250 (9.231342)	
validation loss: 	9.1918324186961	
wrote json checkpoint to checkpoints/model_id.json	
iter 1: 9.186291 (9.191735)	
iter 2: 9.096266 (9.191258)	
iter 3: 8.853861 (9.189571)	
iter 4: 8.986907 (9.188557)	
iter 5: 9.046180 (9.187845)	
iter 6: 8.980703 (9.186810)	
iter 7: 8.890981 (9.185331)	
iter 8: 8.588496 (9.182346)	
iter 9: 8.541151 (9.179140)	
iter 10: 8.375111 (9.175120)	
iter 11: 8.544146 (9.171965)	
iter 12: 8.388834 (9.168050)	
iter 13: 7.858945 (9.161504)	
iter 14: 7.269617 (9.152045)	
iter 15: 7.214907 (9.142359)	
iter 16: 7.827134 (9.135783)	
iter 17: 6.620107 (9.123205)	
iter 18: 7.074771 (9.112962)	
iter 19: 6.372057 (9.099258)	
iter 20: 7.658658 (9.092055)	
iter 21: 7.211345 (9.082651)	
iter 22: 5.337039 (9.063923)	
iter 23: 6.063521 (9.048921)	
iter 24: 6.835446 (9.037854)	
iter 25: 6.785455 (9.026592)	
/home/f/torch/install/bin/luajit: /home/f/torch/install/share/lua/5.1/hdf5/dataset.lua:114: attempt to perform arithmetic on a nil value
stack traceback:
	/home/f/torch/install/share/lua/5.1/hdf5/dataset.lua:114: in function 'rangesToOffsetAndCount'
	/home/f/torch/install/share/lua/5.1/hdf5/dataset.lua:136: in function 'partial'
	./misc/QADatasetLoader.lua:232: in function 'getBatch'
	train_telling.lua:177: in function 'lossFun'
	train_telling.lua:336: in main chunk
	[C]: in function 'dofile'
	...r226/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
	[C]: at 0x00405d50

I would really appreciate if someone can help me with this.

Thanks in advance.

Feeding answer sequences during the evaluation

Inside the eval_split() function (line 249 at train_telling.lua), you're feeding the answer sequence as well as question sequence. In evaluation mode, why you're feeding the answers ?

Attention Visualization?

Hello ,
Please correct me , if i am wrong:
To visualize Attention, Get attention probability and multiply with image regions .
Please share , If any one has the attention visualization code.

cannot run build_attention_nn function

When I only try to run the build_attention_nn as below, I get the following error:

torch/install/share/lua/5.1/nn/View.lua:38: attempt to index local 'input' (a nil value)
stack traceback:

require 'nn'
function build_attention_nn()
  local conv_feat_maps = nn.Identity()()
  local prev_h = nn.Identity()()
  -- compute attention coefficients
  local flatten_conv = nn.View(-1):setNumInputDims(2)(conv_feat_maps)
  local f_conv = nn.Linear(512*196, 196)(flatten_conv)
end

build_attention_nn()

Out of Memory error

Hi @yukezhu

I run your program to retrain the model. But I get the out-of-memory error. I have GTX 1080 (8GB) installed in my machine. Which GPU do you use to train the model? How much memory do you use?

`
$ th train_telling.lua -gpuid 0 -mc_evaluation -verbose -finetune_cnn_after -1

QADatasetLoader loading dataset file: visual7w-toolkit/datasets/visual7w-telling/dataset.json
image size is 28653
QADatasetLoader loading json file: data/qa_data.json
vocab size is 3007
QADatasetLoader loading h5 file: data/qa_data.h5
max question sequence length in data is 15
max answer sequence length in data is 5
assigned 5678 images to split val
assigned 8609 images to split test
assigned 14366 images to split train
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:505] Reading dangerously large protocol message. If the message turns out to be larger than 1073741824 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 553432081
Successfully loaded cnn_models/VGG_ILSVRC_16_layers.caffemodel
conv1_1: 64 3 3 3
conv1_2: 64 64 3 3
conv2_1: 128 64 3 3
conv2_2: 128 128 3 3
conv3_1: 256 128 3 3
conv3_2: 256 256 3 3
conv3_3: 256 256 3 3
conv4_1: 512 256 3 3
conv4_2: 512 512 3 3
conv4_3: 512 512 3 3
conv5_1: 512 512 3 3
conv5_2: 512 512 3 3
conv5_3: 512 512 3 3
fc6: 1 1 25088 4096
fc7: 1 1 4096 4096
fc8: 1 1 4096 1000
converting first layer conv filters from BGR to RGB...
THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-1922/cutorch/lib/THC/generic/THCStorage.cu line=65 error=2 : out of memory
/home/jjhu/torch/install/bin/lua: /home/jjhu/torch/install/share/lua/5.2/nn/utils.lua:11: cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-1922/cutorch/lib/THC/generic/THCStorage.cu:65
stack traceback:
[C]: in function 'resize'
/home/jjhu/torch/install/share/lua/5.2/nn/utils.lua:11: in function 'torch_Storage_type'
/home/jjhu/torch/install/share/lua/5.2/nn/utils.lua:57: in function 'recursiveType'
/home/jjhu/torch/install/share/lua/5.2/nn/Module.lua:152: in function 'type'
/home/jjhu/torch/install/share/lua/5.2/nn/utils.lua:45: in function 'recursiveType'
/home/jjhu/torch/install/share/lua/5.2/nn/utils.lua:41: in function 'recursiveType'
/home/jjhu/torch/install/share/lua/5.2/nn/Module.lua:152: in function </home/jjhu/torch/install/share/lua/5.2/nn/Module.lua:143>
(...tail calls...)
train_telling.lua:131: in main chunk
[C]: in function 'dofile'
...jjhu/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: in ?
`

yukezhu / visual7w-qa-models Goto Github PK

visual7w-qa-models's People

Contributors

Stargazers

Watchers

Forkers

visual7w-qa-models's Issues

Segmentation fault in executing train_telling.lua on Jetson TX1

Training with visual genome dataset

Feeding answer sequences during the evaluation

Attention Visualization?

cannot run build_attention_nn function

Out of Memory error

how to exit program elegantly while set -max_iters to -1?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs