GithubHelp home page GithubHelp logo

mavex's Introduction

Multi-Modal Answer Validation for Knowledge-Based VQA

By Jialin Wu, Jiasen Lu, Ashish Sabharwal and Roozbeh Mottaghi

In this project, we present Multi-modal Answer Validation using External knowledge (MAVEx). The idea is to validate a set of promising answer candidates based on answer-specific knowledge retrieval. In particular, MAVEx aims to learn how to extract relevant knowledge from noisy sources, which knowledge source to trust for each answer candidate, and how to validate the candidate using that source.

Installation

  1. Requirements

    We implement this codebase on Ubuntu 18.04.5 LTS with TITAN V GPUs.

  2. Clone this repository

    git clone [email protected]:jialinwu17/MAVEX.git
    
  3. Using conda, create an environment As the implementation is based on ViLBERT-multi-task system, we require a similar virtual environment. Please refer to the Repository Setup step in ViLBERT repository

Data Preparation

  1. Object detection features and base ViLBERT pretrained model.

    As OK-VQA test set contains images that are used in both the object detection module that provides bottom-up attentions and the official released ViLBERT pretrained model, we carefully removed the OK-VQA test images from Visual Genome and COCO dataset and re-train the ResNeXT-152 based Faster RCNN object detector and then the ViLBERT model from scratch following the default hyperparameters.

    The object features can be downloaded from here. Aftr downloading it, please unzip it as 'image_features'

    The ViLBERT pretrained model can be downloaded from here

  2. Google Image features.

    We query Google Image search engine for the external visual knowledge and we process the retrieved images using the object detection module form the last step. Please download the processed image features and idx files following the instructions in below.
    (1) mkdir h5py_accumulate.
    (2) download train_idx to h5py_accumulate.
    (3) download train_features to h5py_accumulate.
    (4) download val_idx to h5py_accumulate.
    (5) download val_features to h5py_accumulate.

  3. Retrieved Knowledge

    Please download retrieved knowledge from here

Training

Train by runnning

python ft_mavex.py --save_name demo --seed 7777 --from_pretrained pytorch_model_4.bin --num_epochs 75

Models and Output files

We publish the MAVEx finetuned model at here and the output results can be downloaded here

Citation

If you find this project useful in your research, please consider citing our paper:

@inproceedings{khz2021interact,
  author = {Wu, Jialin and Lu, Jiasen and Sabharwal, Ashish and Mottaghi, Roozbeh},
  title = {{M}ulti-{M}odal {A}nswer {V}alidation for {K}nowledge-Based {VQA}},
  booktitle = {AAAI},	    
  year = {2022}
}

mavex's People

Contributors

jialinwu17 avatar jialinwu1717 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.