mesnico / relationnetworks-clevr Goto Github PK
View Code? Open in Web Editor NEWA pytorch implementation for "A simple neural network module for relational reasoning", working on the CLEVR dataset
License: MIT License
A pytorch implementation for "A simple neural network module for relational reasoning", working on the CLEVR dataset
License: MIT License
Hmmm. It's me again. I am just wondering whether the model is able to converge after certain epoch. When I run it, the loss no longer decrease once it reaches 3.33 so it doesn't converge. Hope you can share some insights with me. Thanks!
Hi, thanks for sharing of your code!!!
I have followed the instruction and build the virtualenv of pytorch-0.3.1
However, when running the following line
python train.py --clevr-dir /home/Data/CLEVR/CLEVR_v1.0 --model 'original-sd' | tee logfile.log
The terminal gives out the following error:
Traceback (most recent call last):
File "train.py", line 418, in <module>
Loaded hyperparameters from configuration config.json, model: original-sd: {'state_description': True, 'g_layers': [512, 512, 512, 512], 'question_injection_position': 0, 'f_fc1': 512, 'f_fc2': 1024, 'dropout': 0.05, 'lstm_hidden': 256, 'lstm_word_emb': 32, 'rl_in_size': 14}
Building word dictionaries from all the words in the dataset...
==> using cached dictionaries: /home/Data/CLEVR/CLEVR_v1.0/questions/CLEVR_built_dictionaries.pkl
Word dictionary completed!
Initializing CLEVR dataset...
==> using cached questions: /home/Data/CLEVR/CLEVR_v1.0/questions/CLEVR_train_questions.pkl
==> using cached scenes: /home/Data/CLEVR/CLEVR_v1.0/scenes/CLEVR_train_scenes.pkl
main(args)
File "train.py", line 247, in main
clevr_dataset_train, clevr_dataset_test = initialize_dataset(args.clevr_dir, dictionaries, hyp['state_description'])
File "train.py", line 194, in initialize_dataset
clevr_dataset_train = ClevrDatasetStateDescription(clevr_dir, True, dictionaries)
File "/home/GoodPaperCode/RelationNetworks-CLEVR-master/clevr_dataset_connector.py", line 98, in __init__
self.objects = pickle.load(f)
AttributeError: Can't get attribute '_rebuild_tensor_v2' on <module 'torch._utils' from '/home/GoodPaperCode/RelationNetworks-CLEVR-master/env/lib/python3.6/site-packages/torch/_utils.py'>
I checked the website and find the related answer
Do you have any idea how can I fix this?
Any help would be appreciated and thanks for your time!
train.py
fails on both training from scratch and using a pre-trained model with the following error:
File "/RelationNetworks-CLEVR/utils.py", line 120, in collate_samples
question=torch.stack(padded_questions)
TypeError: stack(): argument 'tensors' (position 1) must be tuple of Tensors, not Tensor
Hi, @mesnico, could you share the GPU device you used and how long it takes to training this network?
Your implementation is really great. I am interested in the model and running experiment on it. However, I have tried on two different machines and both of them are terminated at Epoch 9. Do you have the same problem?
Hi, could you tell me about how long training takes? I have access to 4 - 8 GPUs, but I am in a hurry.
When I run the code, I get the following output:
(rn_env) exx@ubuntu:/data/Rudra/RelationNetworks-CLEVR$ python
Python 3.6.6 (default, Jun 28 2018, 00:00:00)
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> exit()
(rn_env) exx@ubuntu:/data/Rudra/RelationNetworks-CLEVR$ pyton -m train --clevr-dir /data/DATASETS/CLEVR_v1.0/ --model 'original-fp' | tee logfile.log
No command 'pyton' found, did you mean:
Command 'python' from package 'python-minimal' (main)
Command 'pytone' from package 'pytone' (universe)
pyton: command not found
(rn_env) exx@ubuntu:/data/Rudra/RelationNetworks-CLEVR$ python -m train --clevr-dir /data/DATASETS/CLEVR_v1.0/ --model 'original-fp' | tee logfile.log
TRAIN: 0%| | 0/350 [00:00<?, ?it/sL
oaded hyperparameters from configuration config.json, model: original-fp: {'state_description': False, 'g_layers': [256, 256, 256, 256], 'question_injection_position': 0, 'f_fc1': 256, 'f_fc2': 256, 'dropout': 0
.5, 'lstm_hidden': 128, 'lstm_word_emb': 32, 'rl_in_size': 52}
Building word dictionaries from all the words in the dataset...
==> using cached dictionaries: /data/DATASETS/CLEVR_v1.0/questions/CLEVR_built_dictionaries.pkl
Word dictionary completed!
Initializing CLEVR dataset...
==> using cached questions: /data/DATASETS/CLEVR_v1.0/questions/CLEVR_train_questions.pkl
==> using cached questions: /data/DATASETS/CLEVR_v1.0/questions/CLEVR_val_questions.pkl
CLEVR dataset initialized!
Supposing original DeepMind model
Training (350 epochs) is starting...
Dataset reinitialized with batch size 640
Current learning rate: 1e-05
T
raceback (most recent call last):███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊| 1093/1094 [11:21:28<00:37, 37.41s/it, loss=1.92]
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/data/Rudra/RelationNetworks-CLEVR/train.py", line 418, in <module>
main(args)
File "/data/Rudra/RelationNetworks-CLEVR/train.py", line 356, in main
train(clevr_train_loader, model, optimizer, epoch, args)
File "/data/Rudra/RelationNetworks-CLEVR/train.py", line 40, in train
output = model(img, qst)
File "/data/Rudra/virtualenvs/rn_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
result = self.forward(*input, **kwargs)
File "/data/Rudra/RelationNetworks-CLEVR/model.py", line 200, in forward
x = torch.cat([x, self.coord_tensor], 1) # (B x 24+2 x 8*8)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 469 and 640 in dimension 0 at /pytorch/torch/lib/TH/generic/THTensorMath.c:2897
Train Epoch: 1 [0/700160 (0%)] Train loss: 39.945804595947266
Train Epoch: 1 [6400/700160 (1%)] Train loss: 36.57775611877442
Train Epoch: 1 [12800/700160 (2%)] Train loss: 29.848896408081053
Train Epoch: 1 [19200/700160 (3%)] Train loss: 24.984291648864748
Train Epoch: 1 [25600/700160 (4%)] Train loss: 20.945134353637695
.
.
.
Train Epoch: 1 [684800/700160 (98%)] Train loss: 1.8508247494697572
Train Epoch: 1 [691200/700160 (99%)] Train loss: 1.8768051743507386
Train Epoch: 1 [697600/700160 (100%)] Train loss: 1.8581566572189332
(rn_env) exx@ubuntu:/data/Rudra/RelationNetworks-CLEVR$
I have also attached my logfile with this. When I run the plot function, I get empty plots for everything apart from training loss. Please let me know where the issue might be. Thanks.
To the best of my knowledge, most repos on github only report the overall accuracy, but I want the same format as the paper reports.
Can you help me, it will be very helpful for me!
Hi @mesnico ,
I am wondering what is the best train/val accuracies you could get with your code, I have trying to implement this paper as well recently and my number are far off from the reported results.
Thanks.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.