GithubHelp home page GithubHelp logo

mesnico / relationnetworks-clevr Goto Github PK

View Code? Open in Web Editor NEW
85.0 7.0 26.0 3.77 MB

A pytorch implementation for "A simple neural network module for relational reasoning", working on the CLEVR dataset

License: MIT License

Python 95.27% Shell 1.75% Dockerfile 2.98%
relation-network relationships clevr deep-learning machine-learning visual-question-answering pytorch

relationnetworks-clevr's People

Contributors

fabiocarrara avatar mesnico avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

relationnetworks-clevr's Issues

Does not converge

Hmmm. It's me again. I am just wondering whether the model is able to converge after certain epoch. When I run it, the loss no longer decrease once it reaches 3.33 so it doesn't converge. Hope you can share some insights with me. Thanks!

AttributeError: Can't get attribute '_rebuild_tensor_v2' on <module 'torch._utils' from '/home/GoodPaperCode/RelationNetworks-CLEVR-master/env/lib/python3.6/site-packages/torch/_utils.py'>

Hi, thanks for sharing of your code!!!
I have followed the instruction and build the virtualenv of pytorch-0.3.1
However, when running the following line

python train.py --clevr-dir /home/Data/CLEVR/CLEVR_v1.0 --model 'original-sd' | tee logfile.log

The terminal gives out the following error:

Traceback (most recent call last):
  File "train.py", line 418, in <module>
Loaded hyperparameters from configuration config.json, model: original-sd: {'state_description': True, 'g_layers': [512, 512, 512, 512], 'question_injection_position': 0, 'f_fc1': 512, 'f_fc2': 1024, 'dropout': 0.05, 'lstm_hidden': 256, 'lstm_word_emb': 32, 'rl_in_size': 14}
Building word dictionaries from all the words in the dataset...
==> using cached dictionaries: /home/Data/CLEVR/CLEVR_v1.0/questions/CLEVR_built_dictionaries.pkl
Word dictionary completed!
Initializing CLEVR dataset...
==> using cached questions: /home/Data/CLEVR/CLEVR_v1.0/questions/CLEVR_train_questions.pkl
==> using cached scenes: /home/Data/CLEVR/CLEVR_v1.0/scenes/CLEVR_train_scenes.pkl
    main(args)
  File "train.py", line 247, in main
    clevr_dataset_train, clevr_dataset_test  = initialize_dataset(args.clevr_dir, dictionaries, hyp['state_description'])
  File "train.py", line 194, in initialize_dataset
    clevr_dataset_train = ClevrDatasetStateDescription(clevr_dir, True, dictionaries)
  File "/home/GoodPaperCode/RelationNetworks-CLEVR-master/clevr_dataset_connector.py", line 98, in __init__
    self.objects = pickle.load(f)
AttributeError: Can't get attribute '_rebuild_tensor_v2' on <module 'torch._utils' from '/home/GoodPaperCode/RelationNetworks-CLEVR-master/env/lib/python3.6/site-packages/torch/_utils.py'>

I checked the website and find the related answer
Do you have any idea how can I fix this?

Any help would be appreciated and thanks for your time!

stack() fails

train.py fails on both training from scratch and using a pre-trained model with the following error:

File "/RelationNetworks-CLEVR/utils.py", line 120, in collate_samples
question=torch.stack(padded_questions)
TypeError: stack(): argument 'tensors' (position 1) must be tuple of Tensors, not Tensor

Terminating at Epoch 9

Your implementation is really great. I am interested in the model and running experiment on it. However, I have tried on two different machines and both of them are terminated at Epoch 9. Do you have the same problem?

Training time?

Hi, could you tell me about how long training takes? I have access to 4 - 8 GPUs, but I am in a hurry.

logfile is not showing any runs for the test set. The plots also don't show anything for test set and accuracy.

When I run the code, I get the following output:

(rn_env) exx@ubuntu:/data/Rudra/RelationNetworks-CLEVR$ python                          
Python 3.6.6 (default, Jun 28 2018, 00:00:00)                                         
[GCC 4.8.4] on linux                                             
Type "help", "copyright", "credits" or "license" for more information.                 
>>> import torch                                                   
>>> exit()                                                                     
(rn_env) exx@ubuntu:/data/Rudra/RelationNetworks-CLEVR$ pyton -m train --clevr-dir /data/DATASETS/CLEVR_v1.0/ --model 'original-fp' | tee logfile.log
No command 'pyton' found, did you mean:                           
 Command 'python' from package 'python-minimal' (main)                                                                                                                                                             
 Command 'pytone' from package 'pytone' (universe)                    
pyton: command not found                                           
(rn_env) exx@ubuntu:/data/Rudra/RelationNetworks-CLEVR$ python -m train --clevr-dir /data/DATASETS/CLEVR_v1.0/ --model 'original-fp' | tee logfile.log                                                             
TRAIN:   0%|                                                                                                                                                                               | 0/350 [00:00<?, ?it/sL
oaded hyperparameters from configuration config.json, model: original-fp: {'state_description': False, 'g_layers': [256, 256, 256, 256], 'question_injection_position': 0, 'f_fc1': 256, 'f_fc2': 256, 'dropout': 0
.5, 'lstm_hidden': 128, 'lstm_word_emb': 32, 'rl_in_size': 52}                                                                                         
Building word dictionaries from all the words in the dataset...                                   
==> using cached dictionaries: /data/DATASETS/CLEVR_v1.0/questions/CLEVR_built_dictionaries.pkl
Word dictionary completed!                                                                                                                                                                                         
Initializing CLEVR dataset...
==> using cached questions: /data/DATASETS/CLEVR_v1.0/questions/CLEVR_train_questions.pkl
==> using cached questions: /data/DATASETS/CLEVR_v1.0/questions/CLEVR_val_questions.pkl
CLEVR dataset initialized!
Supposing original DeepMind model
Training (350 epochs) is starting...
Dataset reinitialized with batch size 640
Current learning rate: 1e-05
                                                                                                                                                                                                                  T
raceback (most recent call last):███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊| 1093/1094 [11:21:28<00:37, 37.41s/it, loss=1.92]
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/data/Rudra/RelationNetworks-CLEVR/train.py", line 418, in <module>
    main(args)
  File "/data/Rudra/RelationNetworks-CLEVR/train.py", line 356, in main
    train(clevr_train_loader, model, optimizer, epoch, args)
  File "/data/Rudra/RelationNetworks-CLEVR/train.py", line 40, in train
    output = model(img, qst)
  File "/data/Rudra/virtualenvs/rn_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/Rudra/RelationNetworks-CLEVR/model.py", line 200, in forward
    x = torch.cat([x, self.coord_tensor], 1)    # (B x 24+2 x 8*8)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 469 and 640 in dimension 0 at /pytorch/torch/lib/TH/generic/THTensorMath.c:2897
Train Epoch: 1 [0/700160 (0%)] Train loss: 39.945804595947266
Train Epoch: 1 [6400/700160 (1%)] Train loss: 36.57775611877442
Train Epoch: 1 [12800/700160 (2%)] Train loss: 29.848896408081053
Train Epoch: 1 [19200/700160 (3%)] Train loss: 24.984291648864748
Train Epoch: 1 [25600/700160 (4%)] Train loss: 20.945134353637695
.
.
.
Train Epoch: 1 [684800/700160 (98%)] Train loss: 1.8508247494697572
Train Epoch: 1 [691200/700160 (99%)] Train loss: 1.8768051743507386
Train Epoch: 1 [697600/700160 (100%)] Train loss: 1.8581566572189332

(rn_env) exx@ubuntu:/data/Rudra/RelationNetworks-CLEVR$ 

I have also attached my logfile with this. When I run the plot function, I get empty plots for everything apart from training loss. Please let me know where the issue might be. Thanks.

logfile.log

Train/Val Accuracy

Hi @mesnico ,

I am wondering what is the best train/val accuracies you could get with your code, I have trying to implement this paper as well recently and my number are far off from the reported results.

Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.