GithubHelp home page GithubHelp logo

ammieqi / sobert-xvqa-demo Goto Github PK

View Code? Open in Web Editor NEW

This project forked from frkl/sobert-xvqa-demo

0.0 1.0 0.0 7.54 MB

Visual Question Answering with attention map explanations using the SOBERT-VQA model

License: Apache License 2.0

Dockerfile 0.31% Python 83.98% Shell 0.33% HTML 15.38%

sobert-xvqa-demo's Introduction

Explainable VQA with SOBERT

This repository provides source code and data for building a docker container to demo several capabilities of the Spatial-Object Attention BERT (SOBERT) Visual Question Answering (VQA) model with BERT and ErrorCam attention maps.

How to use

Prerequisites

We have prepared script ./script_dep_data.sh for downloading the checkpoints and obtaining the dependency libraries.

Building docker

Build a docker image using the following command.

docker image build -t sobert-vqa .

You may also skip this step by importing a prebuilt docker image.

Launching docker and flask server

Launch an interactive session with the docker using the following command.

nvidia-docker run -p 5001:5001 -it sobert-vqa /bin/bash

Inside the interactive sesion, use the following command to launch the backend of the demo.

cd /vqa-server/
python flask_server.py --port 5001

The web interface will be hosted at http://{your ip}:5001/ . You may change the port by replacing 5001 with your port number in the commands above.

Web interface

Below is a step-by-step instruction on using our VQA system.

Instructions

For developers

Component diagram

Below is a component diagram of the flask server.

Component Diagram

JSON APIs

result=vqa(imurl,question)
  • Performs VQA: answer a question about an image. Computes VQA answers from multiple models to input image and question.
  • Inputs
    • imurl (string): the url of the input image.
    • question (string): the question
  • Output is an object with following variables
    • answers (list of strings): answers from each model
result=explain(imurl,question)
  • Provides explanations for VQA. Computes average attention map explanations and other types of explanations
  • Inputs
    • imurl (string): the url of the input image.
    • question (string): the question
  • Output is an object with following variables
    • spatial_attn (list of objects): spatial attention map data for each model. Each object includes: average (string) url to the spatial attention map and tokens (list of strings) the tokenized question.
    • object_attn (list of objects): object attention map data for each model. Each object includes: average (string) url to the object attention map and tokens (list of strings) the tokenized question.
    • topk_answers (list of list of objects): top-k answers and their confidence for each model. Each object includes: answer (string) the k-th ranked answer and confidence (float) the probability of the k-th ranked answer.
    • related_qas (list of list of objects): top-k related QA-pairs for each model. Each QA pair is an object that includes: question (string) the k-th most related question, answer (string) model’s answer to the k-th related question and r (float) the rated relevance of the question.
imurl=remove_box(imurl,misc)
  • Foreground inpainting
  • Inputs
    • imurl (string): the url of the input image.
    • misc (4-tuple of float): the bounding box (x,y,w,h) of foreround, where (x,y) are the coordinates of the top-left corner of the box, and (w,h) are the width and height of the box. Coordinates are ranged in [0,1].
  • Output is an object with following variables
    • imurl (string): the url of the edited image.
imurl=remove_background(imurl,misc)
  • Background inpainting
  • Inputs
    • imurl (string): the url of the input image.
    • misc (4-tuple of float): the bounding box (x,y,w,h) of foreround, where (x,y) are the coordinates of the top-left corner of the box, and (w,h) are the width and height of the box. Coordinates are ranged in [0,1].
  • Output is an object with following variables
    • imurl (string): the url of the edited image.
imurl=zoom_in(imurl,misc)
  • Zooming in to foreground
  • Inputs
    • imurl (string): the url of the input image.
    • misc (4-tuple of float): the bounding box (x,y,w,h) of foreround, where (x,y) are the coordinates of the top-left corner of the box, and (w,h) are the width and height of the box. Coordinates are ranged in [0,1].
  • Output is an object with following variables
    • imurl (string): the url of the edited image.
imurl=black_and_white(imurl)
  • Turning an image black-and-white
  • Inputs
    • imurl (string): the url of the input image.
  • Output is an object with following variables
    • imurl (string): the url of the edited image.

Misc

Removing the built docker and free up space.

docker image rm --force sobert-vqa
docker system prune

References

If you use the SOBERT-VQA attention maps as part of published research, please cite the following paper

@INPROCEEDINGS{Alipour_2020,
  author={Alipour, Kamran and Ray, Arijit and Lin, Xiao and Schulze, Jurgen P. and Yao, Yi and Burachas, Giedrius T.},
  booktitle={2020 IEEE International Conference on Humanized Computing and Communication with Artificial Intelligence (HCCAI)}, 
  title={The Impact of Explanations on AI Competency Prediction in VQA}, 
  year={2020},
  volume={},
  number={},
  pages={25-32},
  doi={10.1109/HCCAI49649.2020.00010}}

If you use the SOBERT-VQA model as part of published research, please acknowledge the following repo

@misc{SOBERT-XVQA,
author = {Xiao Lin, Sangwoo Cho, Kamran Alipour, Arijit Ray, Jurgen P. Schulze, Yi Yao and Giedrius Buracas},
title = {SOBERT-XVQA: Spatial-Object Attention BERT Visual Question Answering model},
year = {2021},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/frkl/SOBERT-XVQA-demo}},
}

The following codebases are used in this repository.

sobert-xvqa-demo's People

Contributors

frkl avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.