Comments (6)
Hi, That's wired, with the default batch size (500) it takes about 9 GB GPU RAM to run, I've also tested on Titan X. Could you try again with the smaller batch size (such as 128) and make sure no other program running on that GPU?
from vqa_lstm_cnn.
@kumarabhinavgupta In my case, it works fine. To make sure your situation, update cunn
and cutorch
using luarocks install cunn
and luarocks install cutorch
.
from vqa_lstm_cnn.
Thanks for replying.
We updated cunn and cutorch as suggested by @jnhwkim and tried with batch size of 128 (and even 1) as suggested by @jiasenlu . But still we are getting the "Out of Memory" error, but this time after 900 iterations.
The same Titan X is training other networks which require 8-9GB of RAMs, without a problem.
What are the likely errors which we might look for ?
from vqa_lstm_cnn.
I think maybe it's safer to add collectgarbage() inside the training function. Could you try adding the following in the training function?
if i%50 == 0 then
collectgarbage()
end
or you can re-download the train.lua and try again.
from vqa_lstm_cnn.
Thanks a lot. The solution is training now.
We had to make 2 more changes.
- It should be "iter" instead of i
- Remove "end" at line no. 310
from vqa_lstm_cnn.
I've faced the same situation in different machine. In my case, I have to collect garbages at every 5 iterations. FYI.
from vqa_lstm_cnn.
Related Issues (20)
- Bugs in filtering and encoding questions in prepro.py HOT 1
- Ideas for NLP pre-processing and feature engineering HOT 16
- How to cite the model? HOT 3
- Number of training picture
- require 'cunn' and 'cutorch' in CPU mode
- Some problems while implementing with tensorflow HOT 1
- Might be remove the second term of output in LSTM
- setting for abstract? HOT 3
- Issue while trying to run the evaluation script HOT 2
- Providing feedback through correct answer
- would you tell me more about the parameter and dataset?
- Fail to repeat the accuracy of the pretrained VGG model HOT 2
- Trained model gets low accuracy on VQA server HOT 2
- Number of pretrained image features not matching with number of images in COCO
- Unsupported marker type 0xf0 HOT 1
- out of memory
- no clue HOT 1
- UNk Token
- Abstract scene parameters num_ans and num_output
- How to process the multiple choice answer
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vqa_lstm_cnn.