mesnico / relationnetworks-clevr Goto Github PK

View Code? Open in Web Editor NEW

85.0 7.0 26.0 3.77 MB

A pytorch implementation for "A simple neural network module for relational reasoning", working on the CLEVR dataset

License: MIT License

Python 95.27% Shell 1.75% Dockerfile 2.98%

relation-network relationships clevr deep-learning machine-learning visual-question-answering pytorch

relationnetworks-clevr's People

Contributors

Stargazers

Watchers

relationnetworks-clevr's Issues

Does not converge

Hmmm. It's me again. I am just wondering whether the model is able to converge after certain epoch. When I run it, the loss no longer decrease once it reaches 3.33 so it doesn't converge. Hope you can share some insights with me. Thanks!

AttributeError: Can't get attribute '_rebuild_tensor_v2' on <module 'torch._utils' from '/home/GoodPaperCode/RelationNetworks-CLEVR-master/env/lib/python3.6/site-packages/torch/_utils.py'>

Hi, thanks for sharing of your code!!!
I have followed the instruction and build the virtualenv of pytorch-0.3.1
However, when running the following line

python train.py --clevr-dir /home/Data/CLEVR/CLEVR_v1.0 --model 'original-sd' | tee logfile.log

The terminal gives out the following error:

Traceback (most recent call last):
  File "train.py", line 418, in <module>
Loaded hyperparameters from configuration config.json, model: original-sd: {'state_description': True, 'g_layers': [512, 512, 512, 512], 'question_injection_position': 0, 'f_fc1': 512, 'f_fc2': 1024, 'dropout': 0.05, 'lstm_hidden': 256, 'lstm_word_emb': 32, 'rl_in_size': 14}
Building word dictionaries from all the words in the dataset...
==> using cached dictionaries: /home/Data/CLEVR/CLEVR_v1.0/questions/CLEVR_built_dictionaries.pkl
Word dictionary completed!
Initializing CLEVR dataset...
==> using cached questions: /home/Data/CLEVR/CLEVR_v1.0/questions/CLEVR_train_questions.pkl
==> using cached scenes: /home/Data/CLEVR/CLEVR_v1.0/scenes/CLEVR_train_scenes.pkl
    main(args)
  File "train.py", line 247, in main
    clevr_dataset_train, clevr_dataset_test  = initialize_dataset(args.clevr_dir, dictionaries, hyp['state_description'])
  File "train.py", line 194, in initialize_dataset
    clevr_dataset_train = ClevrDatasetStateDescription(clevr_dir, True, dictionaries)
  File "/home/GoodPaperCode/RelationNetworks-CLEVR-master/clevr_dataset_connector.py", line 98, in __init__
    self.objects = pickle.load(f)
AttributeError: Can't get attribute '_rebuild_tensor_v2' on <module 'torch._utils' from '/home/GoodPaperCode/RelationNetworks-CLEVR-master/env/lib/python3.6/site-packages/torch/_utils.py'>

I checked the website and find the related answer
Do you have any idea how can I fix this?

Any help would be appreciated and thanks for your time!

stack() fails

train.py fails on both training from scratch and using a pre-trained model with the following error:

File "/RelationNetworks-CLEVR/utils.py", line 120, in collate_samples
question=torch.stack(padded_questions)
TypeError: stack(): argument 'tensors' (position 1) must be tuple of Tensors, not Tensor

About the running time and gpu memory usage

Hi, @mesnico, could you share the GPU device you used and how long it takes to training this network?

Terminating at Epoch 9

Your implementation is really great. I am interested in the model and running experiment on it. However, I have tried on two different machines and both of them are terminated at Epoch 9. Do you have the same problem?

Training time?

Hi, could you tell me about how long training takes? I have access to 4 - 8 GPUs, but I am in a hurry.

logfile is not showing any runs for the test set. The plots also don't show anything for test set and accuracy.

When I run the code, I get the following output:

(rn_env) exx@ubuntu:/data/Rudra/RelationNetworks-CLEVR$ python                          
Python 3.6.6 (default, Jun 28 2018, 00:00:00)                                         
[GCC 4.8.4] on linux                                             
Type "help", "copyright", "credits" or "license" for more information.                 
>>> import torch                                                   
>>> exit()                                                                     
(rn_env) exx@ubuntu:/data/Rudra/RelationNetworks-CLEVR$ pyton -m train --clevr-dir /data/DATASETS/CLEVR_v1.0/ --model 'original-fp' | tee logfile.log
No command 'pyton' found, did you mean:                           
 Command 'python' from package 'python-minimal' (main)                                                                                                                                                             
 Command 'pytone' from package 'pytone' (universe)                    
pyton: command not found                                           
(rn_env) exx@ubuntu:/data/Rudra/RelationNetworks-CLEVR$ python -m train --clevr-dir /data/DATASETS/CLEVR_v1.0/ --model 'original-fp' | tee logfile.log                                                             
TRAIN:   0%|                                                                                                                                                                               | 0/350 [00:00<?, ?it/sL
oaded hyperparameters from configuration config.json, model: original-fp: {'state_description': False, 'g_layers': [256, 256, 256, 256], 'question_injection_position': 0, 'f_fc1': 256, 'f_fc2': 256, 'dropout': 0
.5, 'lstm_hidden': 128, 'lstm_word_emb': 32, 'rl_in_size': 52}                                                                                         
Building word dictionaries from all the words in the dataset...                                   
==> using cached dictionaries: /data/DATASETS/CLEVR_v1.0/questions/CLEVR_built_dictionaries.pkl
Word dictionary completed!                                                                                                                                                                                         
Initializing CLEVR dataset...
==> using cached questions: /data/DATASETS/CLEVR_v1.0/questions/CLEVR_train_questions.pkl
==> using cached questions: /data/DATASETS/CLEVR_v1.0/questions/CLEVR_val_questions.pkl
CLEVR dataset initialized!
Supposing original DeepMind model
Training (350 epochs) is starting...
Dataset reinitialized with batch size 640
Current learning rate: 1e-05
                                                                                                                                                                                                                  T
raceback (most recent call last):███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊| 1093/1094 [11:21:28<00:37, 37.41s/it, loss=1.92]
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/data/Rudra/RelationNetworks-CLEVR/train.py", line 418, in <module>
    main(args)
  File "/data/Rudra/RelationNetworks-CLEVR/train.py", line 356, in main
    train(clevr_train_loader, model, optimizer, epoch, args)
  File "/data/Rudra/RelationNetworks-CLEVR/train.py", line 40, in train
    output = model(img, qst)
  File "/data/Rudra/virtualenvs/rn_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/Rudra/RelationNetworks-CLEVR/model.py", line 200, in forward
    x = torch.cat([x, self.coord_tensor], 1)    # (B x 24+2 x 8*8)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 469 and 640 in dimension 0 at /pytorch/torch/lib/TH/generic/THTensorMath.c:2897
Train Epoch: 1 [0/700160 (0%)] Train loss: 39.945804595947266
Train Epoch: 1 [6400/700160 (1%)] Train loss: 36.57775611877442
Train Epoch: 1 [12800/700160 (2%)] Train loss: 29.848896408081053
Train Epoch: 1 [19200/700160 (3%)] Train loss: 24.984291648864748
Train Epoch: 1 [25600/700160 (4%)] Train loss: 20.945134353637695
.
.
.
Train Epoch: 1 [684800/700160 (98%)] Train loss: 1.8508247494697572
Train Epoch: 1 [691200/700160 (99%)] Train loss: 1.8768051743507386
Train Epoch: 1 [697600/700160 (100%)] Train loss: 1.8581566572189332

(rn_env) exx@ubuntu:/data/Rudra/RelationNetworks-CLEVR$

I have also attached my logfile with this. When I run the plot function, I get empty plots for everything apart from training loss. Please let me know where the issue might be. Thanks.

logfile.log

Do you know how to calculate the accuracy of the Count, Exists, Compare Numbers, etc

To the best of my knowledge, most repos on github only report the overall accuracy, but I want the same format as the paper reports.

Can you help me, it will be very helpful for me!

Train/Val Accuracy

Hi @mesnico ,

I am wondering what is the best train/val accuracies you could get with your code, I have trying to implement this paper as well recently and my number are far off from the reported results.

Thanks.

mesnico / relationnetworks-clevr Goto Github PK

relationnetworks-clevr's People

Contributors

Stargazers

Watchers

Forkers

relationnetworks-clevr's Issues

Does not converge

AttributeError: Can't get attribute '_rebuild_tensor_v2' on <module 'torch._utils' from '/home/GoodPaperCode/RelationNetworks-CLEVR-master/env/lib/python3.6/site-packages/torch/_utils.py'>

stack() fails

About the running time and gpu memory usage

Terminating at Epoch 9

Training time?

logfile is not showing any runs for the test set. The plots also don't show anything for test set and accuracy.

Do you know how to calculate the accuracy of the Count, Exists, Compare Numbers, etc

Train/Val Accuracy

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs