Multi-Modal Answer Validation for Knowledge-Based VQA

By Jialin Wu, Jiasen Lu, Ashish Sabharwal and Roozbeh Mottaghi

In this project, we present Multi-modal Answer Validation using External knowledge (MAVEx). The idea is to validate a set of promising answer candidates based on answer-specific knowledge retrieval. In particular, MAVEx aims to learn how to extract relevant knowledge from noisy sources, which knowledge source to trust for each answer candidate, and how to validate the candidate using that source.

Installation

Requirements

We implement this codebase on Ubuntu 18.04.5 LTS with TITAN V GPUs.

Clone this repository

git clone [email protected]:jialinwu17/MAVEX.git

Using conda, create an environment As the implementation is based on ViLBERT-multi-task system, we require a similar virtual environment. Please refer to the Repository Setup step in ViLBERT repository

Data Preparation

Object detection features and base ViLBERT pretrained model.

As OK-VQA test set contains images that are used in both the object detection module that provides bottom-up attentions and the official released ViLBERT pretrained model, we carefully removed the OK-VQA test images from Visual Genome and COCO dataset and re-train the ResNeXT-152 based Faster RCNN object detector and then the ViLBERT model from scratch following the default hyperparameters.

The object features can be downloaded from here. Aftr downloading it, please unzip it as 'image_features'

The ViLBERT pretrained model can be downloaded from here
Google Image features.

We query Google Image search engine for the external visual knowledge and we process the retrieved images using the object detection module form the last step. Please download the processed image features and idx files following the instructions in below.
(1) mkdir h5py_accumulate.
(2) download train_idx to h5py_accumulate.
(3) download train_features to h5py_accumulate.
(4) download val_idx to h5py_accumulate.
(5) download val_features to h5py_accumulate.
Retrieved Knowledge

Please download retrieved knowledge from here

Training

Train by runnning

python ft_mavex.py --save_name demo --seed 7777 --from_pretrained pytorch_model_4.bin --num_epochs 75

Models and Output files

We publish the MAVEx finetuned model at here and the output results can be downloaded here

Citation

If you find this project useful in your research, please consider citing our paper:

@inproceedings{khz2021interact,
  author = {Wu, Jialin and Lu, Jiasen and Sabharwal, Ashish and Mottaghi, Roozbeh},
  title = {{M}ulti-{M}odal {A}nswer {V}alidation for {K}nowledge-Based {VQA}},
  booktitle = {AAAI},	    
  year = {2022}
}

pameladdd / mavex Goto Github PK

mavex's Introduction

Multi-Modal Answer Validation for Knowledge-Based VQA

Installation

Data Preparation

Training

Models and Output files

Citation

mavex's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs