GithubHelp home page GithubHelp logo

vqg-gcn's Introduction

Introduction

This is the source code and additional visualization examples of our Radial-GCN, Radial Graph Convolutional Network for Visual Question Generation.

  1. Different from the existing approaches that typically treat the VQG task as a reversed VQA task, we propose a novel answer-centric approach for the VQG task, which effectively models the associations between the answer and its relevant image regions.
  2. To our best knowledge, we are the first to apply GCN model for the VQG task and devise a new radial graph structure with graphic attention for superior question generation performance and interpretable model behavior.
  3. We conduct comprehensive experiments on three benchmark datasets to verify the advantage of our proposed method on generating meaningful questions on the VQG task and boosting the existing VQA methods on the challenging zero-shot VQA task.

framework


Code Structure

├── Radial-GCN/
|   ├── run_vqg.py          /* The main run files
|   ├── layer_vqg.py        /* Files for the model layer and structure (GCN, VQG)
|   ├── dataset_vqg.py      /* Files for construct vqg dataset
|   ├── utils.py            /* Files for tools
|   ├── main.py             /* Files for caption evaluation
|   ├── supp_questions      /* Files for generate questions for supplementary dataset for zero shot VQA
|   ├── draw_*.py           /* Files for drawing and visualisation
|   ├── readme.md
│   ├── ZS_VQA/
| 	├── data/                 /* Data file for zs_vqa
│   ├── data/                     /* Data files for training vqg
|	├── tools/                /* The modified file from bottom-up attention
|	├── process_image_vqg.py  /* Files for preprocess image
|	├── preprocess_text.py    /* Files for preprocess text

Results

Method VQA2 Visual7W
BLEU-1 BLEU-4 METEOR CIDEr ROUGE-L BLEU-1 BLEU-4 METEOR CIDEr ROUGE-L
LSTM (Baseline) 0.381 0.152 0.198 1.32 0.471 0.447 0.202 0.192 1.13 0.468
LSTM-AN (Baseline) 0.492 0.228 0.243 1.62 0.526 0.463 0.219 0.229 1.34 0.501
SAT (ICML'15) 0.494 0.231 0.244 1.65 0.534 0.467 0.223 0.234 1.34 0.503
IVQA (CVPR'18) 0.502 0.239 0.257 1.84 0.553 0.472 0.227 0.237 1.36 0.508
iQAN (CVPR'18) 0.526 0.271 0.268 2.09 0.568 0.488 0.231 0.251 1.44 0.520
Ours (w/o attention) 0.529 0.273 0.269 2.09 0.570 0.494 0.233 0.257 1.47 0.524
Ours 0.534 0.279 0.271 2.10 0.572 0.501 0.236 0.259 1.52 0.527

Model VQA Model VQG Model VQA val Norm test ZS-VQA test
Bottom-up BAN IVQA Ours Acc@1 Acc@Hum Acc@1 Acc@Hum Acc@1 Acc@Hum
1 59.6 66.6 48.8 56.9 0 0
2 59.0 66.1 48.3 56.0 29.2 39.4
3 59.1 66.3 48.3 56.2 30.1 40.4
4
60.6 67.8 49.8 58.9 0 0
5 60.1 67.5 49.2 58.7 30.7 41.3

Visual Examples

More details can be refer to our main text and supplementary.

View VQG Process

VQG Process



View Question Distribution

Distribution



View Supp. for ZS-VQA

Supp



View More Examples

Q”, “A” and “Q*” denote the ground truth question, the given answer and generated question respectively.

More Examples

vqg-gcn's People

Contributors

wangt-cn avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.