c-vqa's Introduction

C-VQA: Counterfactual Reasoning VQA Dataset

This is the code and data for the paper What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models.

Dataset

The dataset directory is C-VQA. You can find the questions in .csv files.

Download Images

After cloning:

pip install gdown
bash download_images.sh

Scripts

The scripts directory contains all required scripts for running models in the paper.

run_eval_cogvlm.py: CogVLM.
run_eval_lavis.py: InstructBLIP and BLIP (in LAVIS).
run_eval_minigpt4.py: MiniGPT-v2.
run_eval_llava.py: LLaVA.
run_eval_qwen.py: Qwen-VL.
run_eval_codellama.py: ViperGPT with CodeLlama.
run_eval_visprog.py: VisProg.
run_eval_wizard.py: ViperGPT with WizardCoder.

Before you run a script, install the corresponding model and get the weights. Then put the script in the root directory of the model.

Please change PATH_TO_IMAGES in the scripts to the actual directory of images.

Please change PATH_TO_MODEL in the scripts for ViperGPT with different code generators to the actual directory of models.

For example, to run BLIP on C-VQA, run this command in the root directory of LLaVA:

python run_eval_lavis.py --model-name blip2_t5 --model-type pretrain_flant5xxl --query PATH_TO_CSV_FILE

You can find more commands in scripts/README.

After you get the results, run format_response.py to convert raw responses to formatted responses (a single number or a single yes or no). Then run calc_acc.py to get quantitative results of the formatted responses. Remenber to fill in file names in these two scripts.

Download Code Generator Models

Change YOUR_HUGGINGFACE_TOKEN in download_model.py to your huggingface token. Then run:

pip install huggingface_hub
python download_model.py

You can add more code generators in download_model.py by adding models in repo_ids and local_dirs.

Citation

If this code is useful for your research, please consider citing our work.

@InProceedings{zhang2023cvqa,
    author    = {Zhang, Letian and Zhai, Xiaotong and Zhao, Zhongkai and Wen, Xin and Zhao, Bingchen},
    title     = {What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-Modal Language Models},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops},
    year      = {2023}
}

c-vqa's People

Contributors

Stargazers

Watchers

c-vqa's Issues

script to get quantitative results

Thanks for your solid work!

I notice that you only provide infer scripts in path/scripts and save response/new response to csv, do you have any scripts for resulting inference results and get quantitative results?

Thank you for your reply.

Recommend Projects

letian2003 / c-vqa Goto Github PK

c-vqa's Introduction

C-VQA: Counterfactual Reasoning VQA Dataset

Dataset

Download Images

Scripts

Download Code Generator Models

Citation

c-vqa's People

Contributors

Stargazers

Watchers

Forkers

c-vqa's Issues

script to get quantitative results

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs