GithubHelp home page GithubHelp logo

raeidsaqur / mgn Goto Github PK

View Code? Open in Web Editor NEW
22.0 2.0 4.0 1.39 MB

Multimodal Graph Network (MGN): Code repo, examples from the paper

License: MIT License

Python 99.93% Shell 0.07%
vqa compositionality program-synthesis gnn

mgn's Introduction

Logo

Multimodal Graph Networks (MGN)

Associated supporting code for the [Multimodal Graph Networks] paper(https://arxiv.org/).

Table of Contents

Introduction

This code repo acts as the supplementary code and dataset repo for the MGN paper. For CLEVR dataset generation please refer to the original CLEVR repo. For CLOSURE templates, please refer to the repo and paper.

Setup

  1. Clone this repo and the submodules.
  2. Create a conda environment (or virtualenv) (Python 3.7+) for this project:
$ conda create --name mgn

Then use the requirements.txt to install the required packages

$ pip install -r requirements.txt

Prerequisites

The CLEVR Parser library uses spacy framework as the NLP backend to use.

-- Spacy --

The default backend uses spacy for language parsing and pretrained LM models used for embeddings.

Please see spacy's doc for installation instructions.

Spacy language models (LM) can be downloaded following instructions here. N.b. the spacy-transformers package (homepage, github), can be used to download SotA transformer based (BERT, XLNet, RoBerTa) LMs - including the popular HuggingFace implementations.

The very basic installation entails:

$ pip install spacy
$ python -m spacy download en_core_web_sm 

Once installed, validate the available LMs using: python -m spacy info and python -m spacy validate.

spacy_validate

Dataset Generation

Please follow instructions from the CLEVR Dataset Generation repo here You can clone a local copy under ./vendors within the project using:

git submodule update --init --recursive

For replicating the experiments with captions, we can essentially use the same scripts in the aforementioned repo, but simply use 'caption generation' templates. These templates are included in the data/templates directory.

A demo data directory for illustration can be obtained by running:

. data/download-demo-data.sh

The subsequent structure of the data folder should look like: data-mgn-demo

The CLOSURE templates (post downloading) are under data/CLOSURE_v1.0. Addtional templates are under data/templates

Running Experiments

  • Preprocess the questions/captions to generate the .h5 file (e.g. clevr_train_questions_25k.h5)

  • Train: Pretrain on 25K questions, then use the pre-trained model for fine-tuning (using REINFORCE)

    • Pretrain:
    $ python ${ROOT}/mgn/tools/run_train.py \
                    --checkpoint_every 50   \
                    --num_iters 100 \
                    --run_dir ../data/outputs/model_pretrain_clevr_25kpg \
                    --clevr_train_question_path ../data/${PATH_TO_PREPROCESSED_QUESTIONS}/clevr_train_questions_25000/clevr_train_questions_25k.h5 \
                    --gembd_vec_dim 96
    
    • Fine-Tune:
    $ python ${ROOT}/mgn/tools/run_train.py \
                    --reinforce 1 \
                    --learning_rate 1e-5 \
                    --checkpoint_every 50   \
                    --num_iters 100 \
                    --run_dir ../data/outputs/model_reinforce_clevr_25kpg \
                    --load_checkpoint_path ../data/outputs/model_pretrain_clevr_25kpg/checkpoint_best.pt \
                    --clevr_train_question_path ../data/${PATH_TO_PREPROCESSED_QUESTIONS}/clevr_train_questions_25000/clevr_train_questions_25k.h5 \
                    --gembd_vec_dim 96 
    
    
  • Test:

    $ python ${ROOT}/mgn/tools/run_test.py \                     
                    --run_dir ../data/results \
                    --clevr_val_scene_path ../data/{PATH_TO_SCENES}/clevr_val_scenes_parsed.json \
                    --clevr_val_question_path ../data/{PATH_TO_PREPROCESSED_QUESTIONS}/clevr_val_questions.h5 \
                    --clevr_vocab_path ../data/{PATH_TO_VOCAB}/clevr_vocab.json \
                    --load_checkpoint_path ../data/outputs/model_reinforce_clevr_25kpg/checkpoint_best.pt
                    --max_val_samples 1024 \
                    --is_baseline_model 0
    

mgn's People

Contributors

raeidsaqur avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

mgn's Issues

__init__() got an unexpected keyword argument 'has_spatial'

when I run python run_train_e2e.py, an error arise
Traceback (most recent call last): File "run_train_e2e.py", line 24, in <module> from datasets import get_dataloader File "/home/chaofan.yang/mgn/mgn/datasets/__init__.py", line 2, in <module> from .clevr_questions import ClevrQuestionDataset, ClevrQuestionDataLoader File "/home/chaofan.yang/mgn/mgn/datasets/clevr_questions.py", line 36, in <module> graph_parser = clevr_parser.Parser(backend='spacy', model='en_core_web_sm', has_spatial=True, has_matching=True).get_backend(identifier='spacy') File "/home/chaofan.yang/anaconda3/envs/pytorch/lib/python3.7/site-packages/clevr_parser/parser.py", line 32, in __init__ self._inst = type(self)._backend_registry[self.backend](**kwargs) TypeError: __init__() got an unexpected keyword argument 'has_spatial'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.