rosinality / mac-network-pytorch Goto Github PK

View Code? Open in Web Editor NEW

85.0 4.0 25.0 14 KB

Memory, Attention and Composition (MAC) Network for CLEVR implemented in PyTorch

License: MIT License

Python 100.00%

mac-network-pytorch's Issues

Attention and Memory performance is worse

Thanks for making this repo available!

Just want to post that when the memory grate and write attention are used together, the performance seems to be worse.

Why keep two copies of the network (net and net_running)?

Hi, I 'm looking at the code and couldn't understand this part...
Two MACNetworks are created: "net" and "net_running". During training stage, only the "net" is trained, and 0.01% of the "net"'s parameter is injected into the "net_running" model. While at the testing stage, the "net_running" is evaluated.

I'm wondering why it's necessary to keep two copies here, instead of using the a single "net" model directly?
Thanks!

Padded token is not masked when calculating attention in control unit

Thanks for sharing the implementation, it's really nice.

I noticed that when you call the mac unit, with LSTM output and image representation, the question_len is not passed in, and the attention calculation in control unit seems unaware of the "padding tokens", do I miss something here?

Specifically I'm referring to this line:

mac-network-pytorch/model.py

Line 38 in 564ca5b

attn = F.softmax(attn_weight, 1)

With sentence of varying lengths, the attention should be restricted to the actual sentence length, rather than on the padding tokens. Is that right?

Accuracy on validation set

Hi @rosinality, Great job!
Is the accuracy you reported for training set?

I got 96.xx% accuracy on train data, and the avg. accuracy on validation set is 85.9%.

MACNet configuration

@rosinality Hi, since the original MACnet code supports multiple configurations, I wanted to make sure that does your code offers full support for original macnet's functionality or not?

image_features.py output shape?

Hi! This is probably a gap in my understanding of pyTorch or h5py, but I wanted to bring it your attention just in case it’s not.

The output of image_features.py is a batch_size*(num images in split) x 1024 x 14 x 14 numpy array. You assign the features associate with each image to batch_size continuous indices in a slice of the first index. I don’t understand why it’s necessary to store batch_size copies of each image’s features.

Later, when you load the data from the h5 file in the CLEVR dataloader’s getitem method in dataset.py, you index the array as if img[i] gives the features of the ith image. But based on how you initialized the h5 file, these would actually be stored [batch_size*i:batch_size(i+1)], not i.

What am I missing here?

Resume Training using saved checkpoints.

Hi again,

I want to know how can we used the saved checkpoints to reusme training?

I used the following code for this purpose, but it used to give me few warnings and I am not sure if it was loading the weights correctly:

```
   if opt.resume >= 0:
          model_param_file = glob.glob('%s/checkpoint_%s*.model' % (opt.path_to_chkpt_folder, opt.resume))

          net = torch.load(model_param_file[0])

opt.resume is the epoch number I want to resume training from..
Thanks!

rosinality / mac-network-pytorch Goto Github PK

mac-network-pytorch's Issues

Attention and Memory performance is worse

Why keep two copies of the network (net and net_running)?

Padded token is not masked when calculating attention in control unit

Accuracy on validation set

MACNet configuration

image_features.py output shape?

Resume Training using saved checkpoints.

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs