My eval.py file copies from <code class="notranslate

Can you add this line right before you loop through the data? <div class="highligh

yes, I got this outcome as: <div class="snippet-clipboard-content notranslate posi

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks for informing that ,I learned a lot. I changed the code to <code class="notrans

tensor matches error about tbd-nets HOT 10 CLOSED

davidmascharka commented on July 19, 2024

tensor matches error

from tbd-nets.

Comments (10)

davidmascharka commented on July 19, 2024

Can you add this line right before you loop through the data?

print(programs[12])

You should get out

tensor([ 10,  39,  43,  19,  24,  41,   2,   2,   2,  24,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0], device='cuda:0')

The issue here is that the program generator has produced a bad program. The question this corresponds to is: "How many metallic objects are big blue cubes or blue objects?" and the program that is produced is:

'exist',
 'same_shape',
 'unique',
 'filter_material[metal]',
 'filter_size[large]',
 'scene',
 '<END>',
 '<END>',
 '<END>',
 'filter_size[large]',

It's not clear to me why the program generator does so poorly here, but the way we're forwarding input through the model does not handle this case properly. It's fixable by restructuring the way we're handling input, but we'd prefer not to change our logic to handle that. Another possibility would be inspecting the programs to ensure they are, in fact, valid before sending them through the model.

This is a good catch; the program generator is nearly perfect, but this is just a case in which it clearly goes severely wrong. We never ran into this because we didn't use our program generator on the validation data and it doesn't make this sort of mistake on the test data.

Edit: It appears not to be an issue isolated to that question. Further, that question should be easy for the program generator to parse correctly. I'll take a closer look at this later and get back to you.

from tbd-nets.

bidongqinxian commented on July 19, 2024

yes, I got this outcome as:

dengwei@ubuntu:~/tbd-nets$ python eval.py
Generating programs...
Saved programs as /home/dengwei/tbd-nets/data/val/programs.npy
Saved image indices as /home/dengwei/tbd-nets/data/val/image_idxs.npy
Saved questions as /home/dengwei/tbd-nets/data/val/questions.npy
tensor([ 10,  39,  43,  19,  24,  41,   2,   2,   2,  24,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0], device='cuda:0')

@davidmascharka

from tbd-nets.

davidmascharka commented on July 19, 2024

@bidongqinxian thanks for verifying, and for catching this! My generate_programs function was translated incorrectly from Justin Johnson's model. The updated code from b8d6376 should work properly now. If you could pull master and verify, it would be much appreciated.

from tbd-nets.

bidongqinxian commented on July 19, 2024

I don't know whether the bug exists, but it's too slow to generate programs(I spent a few hours to run eval.py but output nothing indeed,the interface still remained "Generating programs..." ),and I'm sure it is running.

from tbd-nets.

davidmascharka commented on July 19, 2024

It's certainly slower, but it shouldn't be on the order of hours. The modification would be to rewrite generate_programs to better match Justin's code, rather than make calls to reinforce_sample for each individual question. Batching is substantially faster. You could add a printout every 100 iterations or so just to see progress.

What hardware are you working with? I was able to finish generating programs on a GTX 1080Ti within 20-30 minutes or so, which seemed reasonable as it's a one-time cost and should be cached for any future runs. This isn't a priority for us, since we only need to do this once for test data and the natural-language component isn't the focus of our work.

from tbd-nets.

bidongqinxian commented on July 19, 2024

Hi, the error still exists as the initial one. And before this error, another error as:

/home/dengwei/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
i =[10 39 43 19 24 41  2  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0]
Traceback (most recent call last):
  File "eval.py", line 72, in <module>
    outputs = tbd_net(feats, programs)
  File "/home/dengwei/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/dengwei/tbd-nets/tbd/module_net.py", line 180, in forward
    module_type = self.vocab['program_idx_to_token'][i]
TypeError: unhashable type: 'numpy.ndarray'

I changed the code as(from line 177 to 182 in module_net.py):

for i in reversed(programs.data[n].cpu().numpy()):
                #print('i =' % i)
                for a,b in enumerate(i):
                    module_type = self.vocab['program_idx_to_token'][b]
                    if module_type in {'<NULL>', '<START>', '<END>', '<UNK>', 'unique'}:
                        continue  # the above are no-ops in our model

the error disappeared, is it right?
@davidmascharka

from tbd-nets.

davidmascharka commented on July 19, 2024

That error would happen if the generate_programs function wasn't squeezing the outputs properly. My commit did something weird, so I pushed again. Line 256 of generate_programs should squeeze the data before appending it to the list.

from tbd-nets.

bidongqinxian commented on July 19, 2024

Thanks for informing that ,I learned a lot. I changed the code to progs.append(program.cpu().numpy().squeeze()), and I removed for a,b in enumerate(i). When I ran the eval.py around 15 minutes, the error occured again

Traceback (most recent call last):
  File "eval.py", line 72, in <module>
    outputs = tbd_net(feats, programs)
  File "/home/dengwei/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/dengwei/tbd-nets/tbd/module_net.py", line 200, in forward
    output = module(feat_input, output)
  File "/home/dengwei/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/dengwei/tbd-nets/tbd/modules.py", line 127, in forward
    attended_feats = torch.mul(feats, attn.repeat(1, self.dim, 1, 1))
RuntimeError: The size of tensor a (128) must match the size of tensor b (16384) at non-singleton dimension 1

I printed out the final_module_outputs of module_net.py in line 202, I found these

tensor([[[[  0.1177,   0.0639,   0.2034,  ...,   0.1145,   0.0676,
             0.0706],
          [  0.1654,   0.3045,   0.4034,  ...,   0.2383,   0.0803,
             0.2134],
          [  0.1039,   0.1023,   0.3015,  ...,   0.1826,   0.0778,
             0.1604],
          ...,
          [  0.1722,   0.0000,   0.1147,  ...,   0.0000,   0.0000,
             0.1538],
          [  0.0000,   0.0000,   0.0000,  ...,   0.0000,   0.0000,
             0.0882],
          [  0.0659,   0.0765,   0.2080,  ...,   0.0000,   0.0000,
             0.4016]],

         [[  0.0000,   0.0621,   0.0000,  ...,   0.0992,   0.0702,
             0.1408],
          [  0.0000,   0.0000,   0.0000,  ...,   0.0000,   0.0564,
             0.1429],
          [  0.0917,   0.1845,   0.0000,  ...,   0.1301,   0.1111,
             0.1944],
          ...,
          [  0.0250,   0.2981,   0.0449,  ...,   0.2866,   0.0776,
             0.0739],
          [  0.0000,   0.0000,   0.0000,  ...,   0.2916,   0.0967,
             0.2125],
          [  0.0000,   0.0000,   0.0000,  ...,   0.0334,   0.1566,
             0.0104]],

and so on
those proved the code can run. Maybe something broken?

from tbd-nets.

davidmascharka commented on July 19, 2024

I'm not sure what happened, but it looks like you're not up-to-date with the current master. The error message you posted says that the code fails on module_net.py line 200 and mentions the call is output = module(feat_input, output). However, Line 200 doesn't contain that function call.

I'm not quite sure what enumerate call you removed; the only call to enumerate we make in this repository is in full-vqa-example.ipynb, which should be completely unrelated to what you're trying to run here.

I would recommend you clone the repository fresh to verify everything on your end matches what I have. I've successfully generated programs for the validation set as you're attempting to do and been able to run them all through the tbd_net instance.

from tbd-nets.

bidongqinxian commented on July 19, 2024

Ok ，maybe I will download the latest master and run the code again. Thanks for your advice!

from tbd-nets.

tensor matches error about tbd-nets HOT 10 CLOSED

Comments (10)

Related Issues (14)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs