GithubHelp home page GithubHelp logo

vacancy / nscl-pytorch-release Goto Github PK

View Code? Open in Web Editor NEW
410.0 410.0 93.0 299 KB

PyTorch implementation for the Neuro-Symbolic Concept Learner (NS-CL).

Home Page: http://nscl.csail.mit.edu

License: MIT License

Python 100.00%
concept-learning neuro-symbolic-learning vqa

nscl-pytorch-release's People

Contributors

vacancy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nscl-pytorch-release's Issues

semantic parser training codes

This current release contains only training codes for the visual modules. That is, currently we still assume that a semantic parser is pre-trained using program annotations. In the full NS-CL, this pre-training is not required. We also plan to release the full training code soon.

Just a reminder, will there be any updates about the training codes of the semantic parser? Thanks!

CUDA Error

Hi,

I had been trying to run your code to train a model for the CLEVR dataset but I'm running into an issue. The traceback is below:

...
09 15:59:43 Building the model.
09 16:01:08 Writing meter logs to file: "dumps/clevr/desc_nscl_derender/derender-curriculum_all-qtrans_off/meta/run-2019-07-09-15-58-56.meter.json".
09 16:01:08 Building the data loader.
09 16:05:22 Building the data loader. Curriculum = 3/4, length = 1930.
  0%|                                                                                                                                  | 0/60 [00:00<?, ?it/s]Using /tmp/torch_extensions as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /tmp/torch_extensions/_prroi_pooling/build.ninja...
Building extension module _prroi_pooling...
ninja: no work to do.
Loading extension module _prroi_pooling...
cudaCheckError() failed : CUDA driver version is insufficient for CUDA runtime version

This was the command that I ran:
jac-crun 1 scripts/trainval.py --desc experiments/clevr/desc_nscl_derender.py --training-target derender --curriculum all --dataset clevr --data-dir clevr/train --batch-size 32 --epoch 100 --validation-interval 5 --save-interval 5 --data-split 0.95

I checked that my driver version (410.78) is in fact compatible with the CUDA version (10.0). Moreover, I am able to run other code that relies on PyTorch and uses the GPU. Am I missing something here?

Would appreciate any help, thanks!

Training stuck at Epoch 15

Hello,
I can train the model since the process kills itself after this message:

Building the data loader. Curriculum = 3/8, length = 32218.
Epoch 15 acc/qa=1.000000 loss=0.046158 loss/qa=0.046158 time/data=0.008719 time/step=1.016501: 100%|##############################| 1006/1006 [18:08<00:00, 1.08s/it]
Epoch 15 (validation) validation/acc/qa=1.000000: 2%|#4 | 20/1094 [00:41<11:04, 1.62it/s]/home/colors/Desktop/nscl/Jacinle/bin/jac-crun: line 6: 3305 Killed $JACROOT/bin/jac-run "$@"

Where can I find the images in questions.json for validation?

For the validation, questions.json file is provided. It contains 15,000 unique images with filenames ranging from CLEVR_val_000000.png to CLEVR_val_014999.png. However, the original CLEVR dataset has two validation splits, split A has filenames going from CLEVR_ValA_00000.png to CLEVR_ValA_014999.png, whereas split B has filenames going from CLEVR_ValB_000000.png to CLEVR_ValB_000000.png. The images in each split are different, so which filenames is questions.json referring to?

confusion on dataset names

Hello,
Can anyone explain me what is the difference between 'scenes-raw.json' and 'scenes.json' files? The scenes.json files with direct link have the exact same format data as in clevr dataset scenes files.

Module for training semantic parser

Hi,
you mention that the code for training full semantic parser will be released later. It will be helpful if instructions for training semantic parser can be provided or full code can be released. I am trying to replicate the experiments for CLEVR and VQS datasets

The code may be incomplete?

I run your code, while the following message doesn't appear.
[07 16:30:54 [email protected]:/data/vision/billf/scratch/jiayuanm/projects/NSCL-PyTorch/nscl/datasets/factory.py] Filtering out questions containing "how big" and "made of", #before = 699989, #after = 633615.

Error coming from Jacinle

I've followed all the instructions in the README.md. However, when I get to running the command jac-crun <gpu_id> scripts/trainval.py --desc experiments/clevr/desc_nscl_derender.py --training-target derender --curriculum all --dataset clevr --data-dir <data_dir>/clevr/train --batch-size 32 --epoch 100 --validation-interval 5 --save-interval 5 --data-split 0.95, I get this error

(nscl) crytting@hatch:~$ jac-crun GPU-0ff7a712-8a42-af6d-7f21-17147fda6a7c --desc experiments/clevr/desc_nscl_derender.py --training-target derender --curriculum all --dataset clevr --data-dir ./clevr/train --batch-size 32 --epoch 100 --validation-interval 5 --save-interval 5 --data-split 0.95
11 09:56:09 Loading jacinle config: /home/crytting/Jacinle/jacinle.yml.
11 09:56:09 Loading vendor: ReservoirSample-PyTorch.
11 09:56:09 Loading vendor: AdvancedIndexing-PyTorch.
11 09:56:09 Loading vendor: SynchronizedBatchNorm-PyTorch.
11 09:56:09 Loading vendor: PreciseRoIPooling-PyTorch.
11 09:56:09 Loading vendor: SceneGraphParser.
/home/crytting/Jacinle/bin/jac-run: line 10: exec: --: invalid option
exec: usage: exec [-cl] [-a name] [command [arguments ...]] [redirection ...]

The line 10 that it references is

exec "$@"

from the mentioned file /home/crytting/Jacinle/bin/jac-run. Any ideas on how to fix?

_use_shared_memory - PyTorch 1.1

The following error appears with PyTorch 1.1:

File "Jacinle/jactorch/data/collate.py", line 149, in _stack
if torchdl._use_shared_memory:
AttributeError: module 'torch.utils.data.dataloader' has no attribute '_use_shared_memory'

dataloader in 1.1 does not have an attribute '_use_shared_memory', see here.

Semantic parser training code

Hi!

We are currently doing a research on your VQA papers, and would really like to have the source code for zero annotation parser training, as it was one of the benefits of the NSCL paper.

Note: This current release contains only training codes for the visual modules. That is, currently we still assume that a semantic parser is pre-trained using program annotations. In the full NS-CL, this pre-training is not required. We also plan to release the full training code soon.

Is there any possibility code will be released soon?

Help using the code

Hello, any chance you could release a one-or-two self contained file with the trained mask-rcnn model, I'd like to use to perform some experiments.

Thanks in advance

Replicating experiments on VQS

Hi!

Just wanted to make a request. Would it be possible for you to add instructions in the README.md for how one can replicate results on the VQS dataset?

Thanks!

about concept quantization

Hi, in your paper, Pr[object i is Red] is given by shifted and scaled sigmoid function, but there seems no sigmoid function in your code as shown below

logits = ((query_mapped * reference).sum(dim=-1) - 1 + margin) / margin / self._tau

If so, the range of the 'logits' variable can be a problem when it adds with the 'belong' vector. Could you explain more about this? Thx!

How is vocal.json being used?

Suppose the model predicts a synonym for metal, e.g. "shiny", how does the program executor map shiny to what it is really supposed to mean, i.e. metal?

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [3, 256, 7, 7]], which is output 0 of CudnnConvolutionBackward, is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

RuntimeError: one of the variables needed for gradient computation has been modified by an in-place operation: [torch.cuda.FloatTensor [3, 256, 7, 7]], which is output 0 of CudnnConvolutionBackward, is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
while running the training code using the command provided in the readme. Further research showed that I could debug this error using the PyTorch anomaly detector. This showed that the error occurred when calling a forward function.
Full traceback:
/pytorch/torch/csrc/autograd/python_anomaly_mode.cpp:57: UserWarning: Traceback of forward call that caused the error:
File "scripts/trainval.py", line 401, in
main()
File "scripts/trainval.py", line 173, in main
main_train(train_dataset, validation_dataset, extra_dataset)
File "scripts/trainval.py", line 291, in main_train
train_epoch(epoch, trainer, train_dataloader, meters)
File "scripts/trainval.py", line 346, in train_epoch
loss, monitors, output_dict, extra_info = trainer.step(feed_dict, cast_tensor=False)
File "/home/user/Jacinle/jactorch/train/env.py", line 135, in step
loss, monitors, output_dict = self._model(feed_dict)
File "/home/user/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "experiments/clevr/desc_nscl_derender.py", line 40, in forward
f_sng = self.scene_graph(f_scene, feed_dict.objects, feed_dict.objects_length)
File "/home/weichen/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/data2/PycharmProjects/NSCL-PyTorch-Release/nscl/nn/scene_graph/scene_graph.py", line 125, in forward
this_object_features[sub_id], this_object_features[obj_id],

Traceback (most recent call last):
File "scripts/trainval.py", line 401, in
main()
File "scripts/trainval.py", line 173, in main
main_train(train_dataset, validation_dataset, extra_dataset)
File "scripts/trainval.py", line 291, in main_train
train_epoch(epoch, trainer, train_dataloader, meters)
File "scripts/trainval.py", line 346, in train_epoch
loss, monitors, output_dict, extra_info = trainer.step(feed_dict, cast_tensor=False)
File "/home/user/Jacinle/jactorch/train/env.py", line 155, in step
loss.backward()
File "/home/user/.local/lib/python3.7/site-packages/torch/tensor.py", line 118, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/user/.local/lib/python3.7/site-packages/torch/autograd/init.py", line 93, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [3, 256, 7, 7]], which is output 0 of CudnnConvolutionBackward, is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

The error occurred in epoch 6. The loss value seemed to be decreasing normally.
Epoch 6 acc/qa=0.687500 loss=0.708516 loss/qa=0.708516 time/data=0.542045 time/step=1.222588: 0%| | 1/469 [00:01<13:45, 1.76s/i
Epoch 6 acc/qa=0.562500 loss=0.713307 loss/qa=0.713307 time/data=0.009401 time/step=1.012083: 0%| | 1/469 [00:02<13:45, 1.76s/i
Epoch 6 acc/qa=0.562500 loss=0.713307 loss/qa=0.713307 time/data=0.009401 time/step=1.012083: 0%| | 2/469 [00:02<11:59, 1.54s/i
Epoch 6 acc/qa=0.812500 loss=0.566920 loss/qa=0.566920 time/data=0.011552 time/step=1.050238: 0%| | 2/469 [00:03<11:59, 1.54s/i
Epoch 6 acc/qa=0.812500 loss=0.566920 loss/qa=0.566920 time/data=0.011552 time/step=1.050238: 1%| | 3/469 [00:03<10:51, 1.40s/it]/

Any ideas,
Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.