Comments (10)
Can you add this line right before you loop through the data?
print(programs[12])
You should get out
tensor([ 10, 39, 43, 19, 24, 41, 2, 2, 2, 24, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0], device='cuda:0')
The issue here is that the program generator has produced a bad program. The question this corresponds to is: "How many metallic objects are big blue cubes or blue objects?" and the program that is produced is:
'exist',
'same_shape',
'unique',
'filter_material[metal]',
'filter_size[large]',
'scene',
'<END>',
'<END>',
'<END>',
'filter_size[large]',
It's not clear to me why the program generator does so poorly here, but the way we're forwarding input through the model does not handle this case properly. It's fixable by restructuring the way we're handling input, but we'd prefer not to change our logic to handle that. Another possibility would be inspecting the programs to ensure they are, in fact, valid before sending them through the model.
This is a good catch; the program generator is nearly perfect, but this is just a case in which it clearly goes severely wrong. We never ran into this because we didn't use our program generator on the validation data and it doesn't make this sort of mistake on the test data.
Edit: It appears not to be an issue isolated to that question. Further, that question should be easy for the program generator to parse correctly. I'll take a closer look at this later and get back to you.
from tbd-nets.
yes, I got this outcome as:
dengwei@ubuntu:~/tbd-nets$ python eval.py
Generating programs...
Saved programs as /home/dengwei/tbd-nets/data/val/programs.npy
Saved image indices as /home/dengwei/tbd-nets/data/val/image_idxs.npy
Saved questions as /home/dengwei/tbd-nets/data/val/questions.npy
tensor([ 10, 39, 43, 19, 24, 41, 2, 2, 2, 24, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0], device='cuda:0')
from tbd-nets.
@bidongqinxian thanks for verifying, and for catching this! My generate_programs
function was translated incorrectly from Justin Johnson's model. The updated code from b8d6376 should work properly now. If you could pull master and verify, it would be much appreciated.
from tbd-nets.
I don't know whether the bug exists, but it's too slow to generate programs(I spent a few hours to run eval.py
but output nothing indeed,the interface still remained "Generating programs..." ),and I'm sure it is running.
from tbd-nets.
It's certainly slower, but it shouldn't be on the order of hours. The modification would be to rewrite generate_programs
to better match Justin's code, rather than make calls to reinforce_sample
for each individual question. Batching is substantially faster. You could add a printout every 100 iterations or so just to see progress.
What hardware are you working with? I was able to finish generating programs on a GTX 1080Ti within 20-30 minutes or so, which seemed reasonable as it's a one-time cost and should be cached for any future runs. This isn't a priority for us, since we only need to do this once for test data and the natural-language component isn't the focus of our work.
from tbd-nets.
Hi, the error still exists as the initial one. And before this error, another error as:
/home/dengwei/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
i =[10 39 43 19 24 41 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0]
Traceback (most recent call last):
File "eval.py", line 72, in <module>
outputs = tbd_net(feats, programs)
File "/home/dengwei/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
result = self.forward(*input, **kwargs)
File "/home/dengwei/tbd-nets/tbd/module_net.py", line 180, in forward
module_type = self.vocab['program_idx_to_token'][i]
TypeError: unhashable type: 'numpy.ndarray'
I changed the code as(from line 177 to 182 in module_net.py
):
for i in reversed(programs.data[n].cpu().numpy()):
#print('i =' % i)
for a,b in enumerate(i):
module_type = self.vocab['program_idx_to_token'][b]
if module_type in {'<NULL>', '<START>', '<END>', '<UNK>', 'unique'}:
continue # the above are no-ops in our model
the error disappeared, is it right?
@davidmascharka
from tbd-nets.
That error would happen if the generate_programs
function wasn't squeezing the outputs properly. My commit did something weird, so I pushed again. Line 256 of generate_programs
should squeeze the data before appending it to the list.
from tbd-nets.
Thanks for informing that ,I learned a lot. I changed the code to progs.append(program.cpu().numpy().squeeze())
, and I removed for a,b in enumerate(i)
. When I ran the eval.py
around 15 minutes, the error occured again
Traceback (most recent call last):
File "eval.py", line 72, in <module>
outputs = tbd_net(feats, programs)
File "/home/dengwei/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
result = self.forward(*input, **kwargs)
File "/home/dengwei/tbd-nets/tbd/module_net.py", line 200, in forward
output = module(feat_input, output)
File "/home/dengwei/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
result = self.forward(*input, **kwargs)
File "/home/dengwei/tbd-nets/tbd/modules.py", line 127, in forward
attended_feats = torch.mul(feats, attn.repeat(1, self.dim, 1, 1))
RuntimeError: The size of tensor a (128) must match the size of tensor b (16384) at non-singleton dimension 1
I printed out the final_module_outputs
of module_net.py
in line 202, I found these
tensor([[[[ 0.1177, 0.0639, 0.2034, ..., 0.1145, 0.0676,
0.0706],
[ 0.1654, 0.3045, 0.4034, ..., 0.2383, 0.0803,
0.2134],
[ 0.1039, 0.1023, 0.3015, ..., 0.1826, 0.0778,
0.1604],
...,
[ 0.1722, 0.0000, 0.1147, ..., 0.0000, 0.0000,
0.1538],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000,
0.0882],
[ 0.0659, 0.0765, 0.2080, ..., 0.0000, 0.0000,
0.4016]],
[[ 0.0000, 0.0621, 0.0000, ..., 0.0992, 0.0702,
0.1408],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0564,
0.1429],
[ 0.0917, 0.1845, 0.0000, ..., 0.1301, 0.1111,
0.1944],
...,
[ 0.0250, 0.2981, 0.0449, ..., 0.2866, 0.0776,
0.0739],
[ 0.0000, 0.0000, 0.0000, ..., 0.2916, 0.0967,
0.2125],
[ 0.0000, 0.0000, 0.0000, ..., 0.0334, 0.1566,
0.0104]],
and so on
those proved the code can run. Maybe something broken?
from tbd-nets.
I'm not sure what happened, but it looks like you're not up-to-date with the current master. The error message you posted says that the code fails on module_net.py
line 200 and mentions the call is output = module(feat_input, output)
. However, Line 200 doesn't contain that function call.
I'm not quite sure what enumerate
call you removed; the only call to enumerate
we make in this repository is in full-vqa-example.ipynb
, which should be completely unrelated to what you're trying to run here.
I would recommend you clone the repository fresh to verify everything on your end matches what I have. I've successfully generated programs for the validation set as you're attempting to do and been able to run them all through the tbd_net
instance.
from tbd-nets.
Ok ,maybe I will download the latest master and run the code again. Thanks for your advice!
from tbd-nets.
Related Issues (14)
- Training the model Step 2
- Properties not specified in modules? HOT 1
- Efficiency question about the model HOT 2
- About converting HDF5 to npy HOT 1
- Version bumps HOT 2
- `imresize` was removed in scipy 1.3 HOT 2
- can not find file scripts/extract_features.py HOT 2
- The environment setting HOT 6
- How to evaluate test results? HOT 4
- evaluate error on val HOT 8
- No longer runs on mybinder.org HOT 4
- RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn HOT 3
- What's the use of cat operation in SameModule HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tbd-nets.