GithubHelp home page GithubHelp logo

mac-network-pytorch's Introduction

mac-network-pytorch

Memory, Attention and Composition (MAC) Network for CLEVR from Compositional Attention Networks for Machine Reasoning (https://arxiv.org/abs/1803.03067) implemented in PyTorch

Requirements:

  • Python 3.6
  • PyTorch 0.4
  • torch-vision
  • Pillow
  • nltk
  • tqdm

To train:

  1. Download and extract CLEVR v1.0 dataset from http://cs.stanford.edu/people/jcjohns/clevr/
  2. Preprocessing question data and extracting image features using ResNet 101
python preprocess.py [CLEVR directory]
python image_feature.py [CLEVR directory]

!CAUTION! the size of file created by image_feature.py is very large! (~70 GiB) You may use hdf5 compression, but it will slow down feature extraction.

  1. Run train.py
python train.py [CLEVR directory]

This implementation produces 95.75% accuracy at epoch 10, 96.5% accuracy at epoch 20.

mac-network-pytorch's People

Contributors

rosinality avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

mac-network-pytorch's Issues

image_features.py output shape?

Hi! This is probably a gap in my understanding of pyTorch or h5py, but I wanted to bring it your attention just in case it’s not.

The output of image_features.py is a batch_size*(num images in split) x 1024 x 14 x 14 numpy array. You assign the features associate with each image to batch_size continuous indices in a slice of the first index. I don’t understand why it’s necessary to store batch_size copies of each image’s features.

Later, when you load the data from the h5 file in the CLEVR dataloader’s getitem method in dataset.py, you index the array as if img[i] gives the features of the ith image. But based on how you initialized the h5 file, these would actually be stored [batch_size*i:batch_size(i+1)], not i.

What am I missing here?

Padded token is not masked when calculating attention in control unit

Thanks for sharing the implementation, it's really nice.

I noticed that when you call the mac unit, with LSTM output and image representation, the question_len is not passed in, and the attention calculation in control unit seems unaware of the "padding tokens", do I miss something here?

Specifically I'm referring to this line:

attn = F.softmax(attn_weight, 1)

With sentence of varying lengths, the attention should be restricted to the actual sentence length, rather than on the padding tokens. Is that right?

MACNet configuration

@rosinality Hi, since the original MACnet code supports multiple configurations, I wanted to make sure that does your code offers full support for original macnet's functionality or not?

Why keep two copies of the network (net and net_running)?

Hi, I 'm looking at the code and couldn't understand this part...
Two MACNetworks are created: "net" and "net_running". During training stage, only the "net" is trained, and 0.01% of the "net"'s parameter is injected into the "net_running" model. While at the testing stage, the "net_running" is evaluated.

I'm wondering why it's necessary to keep two copies here, instead of using the a single "net" model directly?
Thanks!

Resume Training using saved checkpoints.

Hi again,

I want to know how can we used the saved checkpoints to reusme training?

I used the following code for this purpose, but it used to give me few warnings and I am not sure if it was loading the weights correctly:

```
   if opt.resume >= 0:
          model_param_file = glob.glob('%s/checkpoint_%s*.model' % (opt.path_to_chkpt_folder, opt.resume))

          net = torch.load(model_param_file[0])
opt.resume is the epoch number I want to resume training from..
Thanks!

Accuracy on validation set

Hi @rosinality, Great job!
Is the accuracy you reported for training set?

I got 96.xx% accuracy on train data, and the avg. accuracy on validation set is 85.9%.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.