gt-vision-lab / vqa Goto Github PK

View Code? Open in Web Editor NEW

356.0 21.0 138.0 1.74 MB

License: Other

Python 100.00%

vqa's Introduction

Python API and Evaluation Code for v2.0 and v1.0 releases of the VQA dataset.

VQA v2.0 release

This release consists of

Real
- 82,783 MS COCO training images, 40,504 MS COCO validation images and 81,434 MS COCO testing images (images are obtained from [MS COCO website] (http://mscoco.org/dataset/#download))
- 443,757 questions for training, 214,354 questions for validation and 447,793 questions for testing
- 4,437,570 answers for training and 2,143,540 answers for validation (10 per question)

There is only one type of task

Open-ended task

VQA v1.0 release

This release consists of

Real
- 82,783 MS COCO training images, 40,504 MS COCO validation images and 81,434 MS COCO testing images (images are obtained from [MS COCO website] (http://mscoco.org/dataset/#download))
- 248,349 questions for training, 121,512 questions for validation and 244,302 questions for testing (3 per image)
- 2,483,490 answers for training and 1,215,120 answers for validation (10 per question)
Abstract
- 20,000 training images, 10,000 validation images and 20,000 MS COCO testing images
- 60,000 questions for training, 30,000 questions for validation and 60,000 questions for testing (3 per image)
- 600,000 answers for training and 300,000 answers for validation (10 per question)

There are two types of tasks

Open-ended task
Multiple-choice task (18 choices per question)

Requirements

python 2.7
scikit-image (visit this page for installation)
matplotlib (visit this page for installation)

Files

./Questions

For v2.0, download the question files from the VQA download page, extract them and place in this folder.
For v1.0, both real and abstract, question files can be found on the VQA v1 download page.
Question files from Beta v0.9 release (123,287 MSCOCO train and val images, 369,861 questions, 3,698,610 answers) can be found below
- training question files
- validation question files
Question files from Beta v0.1 release (10k MSCOCO images, 30k questions, 300k answers) can be found here.

./Annotations

For v2.0, download the annotations files from the VQA download page, extract them and place in this folder.
For v1.0, for both real and abstract, annotation files can be found on the VQA v1 download page.
Annotation files from Beta v0.9 release (123,287 MSCOCO train and val images, 369,861 questions, 3,698,610 answers) can be found below
- training annotation files
- validation annotation files
Annotation files from Beta v0.1 release (10k MSCOCO images, 30k questions, 300k answers) can be found here.

./Images

For real, create a directory with name mscoco inside this directory. For each of train, val and test, create directories with names train2014, val2014 and test2015 respectively inside mscoco directory, download respective images from MS COCO website and place them in respective folders.
For abstract, create a directory with name abstract_v002 inside this directory. For each of train, val and test, create directories with names train2015, val2015 and test2015 respectively inside abstract_v002 directory, download respective images from VQA download page and place them in respective folders.

./PythonHelperTools

This directory contains the Python API to read and visualize the VQA dataset
vqaDemo.py (demo script)
vqaTools (API to read and visualize data)

./PythonEvaluationTools

This directory contains the Python evaluation code
vqaEvalDemo.py (evaluation demo script)
vqaEvaluation (evaluation code)

./Results

OpenEnded_mscoco_train2014_fake_results.json (an example of a fake results file for v1.0 to run the demo)
Visit [VQA evaluation page] (http://visualqa.org/evaluation) for more details.

./QuestionTypes

This directory contains the following lists of question types for both real and abstract questions (question types are unchanged from v1.0 to v2.0). In a list, if there are question types of length n+k and length n with the same first n words, then the question type of length n does not include questions that belong to the question type of length n+k.
mscoco_question_types.txt
abstract_v002_question_types.txt

References

Developers

Aishwarya Agrawal (Virginia Tech)
Code for API is based on MSCOCO API code.
The format of the code for evaluation is based on MSCOCO evaluation code.

vqa's People

Contributors

Stargazers

Watchers

Forkers

tylin prithv1 liangnet caomw vikkamath chagge stevenlol cv-ip idansc digideskio bigredt vyraun rajdeeppalrajdeep cloud-cv xiangb leonchambers omar-florez qinenergy andreibarsan cadene nazneenrajani tidhar8778 ronghanghu yenchi-hsu zhencang fstrub95 ritvikshrivastava wtdeng researcher2003pro xji stefbaby techbhatia yyf17 alex-paterson singhbhupender1 afcarl varunagrawal komal-sharan cyhbrilliant dwang68 jiangmengqi saikrishnarallabandi hyunji3190 fightyang ankita-kalra akhil1234567 edithzeng yourtone leedoyup strieb memozhu empireofkings phisad celestialsilhouette waallf-frock vzhou842 shayan-taheri shaobo-xu zawecha1 jackroos louis0503 sabbir0019 omahon vetterjn amilalu123 kimhj709 taaccoo-beta 7anmay jimmyalpha qijiazou johncai117 luyulalala tacalvin rodewayne chuong liyanasahir ntusteeian hanyu-liang lydiatse mgg1118 paladinym chensyeric bazycristi21 paullerner ayshrv disguiser15 jgard1 gaybro8777 lianghaotian st7ma784 lizw14 ankitshah009 esimionato wangqiongtok lmhlll aks-dmv hassony2 abhipsabasu alitariq-syed hoof-it

vqa's Issues

How to get the test-dev2015 accuracy?

I can get the accuracy of val using datatype val2014, through this file : https://github.com/VT-vision-lab/VQA/blob/master/PythonEvaluationTools/vqaEvalDemo.py.
But i have no idea how to get the test-dev accuracy on the papers, because there is no annotation file for test-dev on this page(http://www.visualqa.org/download.html).
The annotation files i can get are list as follows:
mscoco_train2014_annotations.json
mscoco_val2014_annotations.json

No annotation file for test-dev2015.
Any clue?

Error in the provided Fake results json file

Hi @deshraj @dexter1691 @jiasenlu @StanislawAntol @jwyang @AishwaryaAgrawal,

The Fake Results JSON file provided has only 248349 entries for the train2014 images, while the total number of questions for train2014 images is 443757. Hence, an error is coming up while calculating the accuracy on the Fake Results JSON file. Please do check it.

Thanks.

Some files can not download

Excuse me, there are some files can not download,

Including ./Annotations ./Questions

no module named 'vqa'

from vpa import VQA, but where is vqa? or How can I install vqa package?

vqa.getImgIds throwing error with single int question_id

from external.vqa.vqa import VQA
import os
data_dir = "../../../data/vqa/data"
annotation_json_file_path = os.path.join(data_dir, "mscoco_train2014_annotations.json")
question_json_file_path = os.path.join(data_dir, "OpenEnded_mscoco_train2014_questions.json")

vqa = VQA(annotation_json_file_path, question_json_file_path)

q_requested = vqa.loadQA(ids=[409380])
img_id_requested = vqa.getImgIds(quesIds=[q_requested[0]['question_id']])

output:
Traceback (most recent call last): File "q1.py", line 13, in <module> img_id_requested = vqa.getImgIds(quesIds=[q_requested[0]['question_id']]) File "external\vqa\vqa.py", line 113, in getImgIds anns = sum([self.qa[quesId] for quesId in quesIds if quesId in self.qa], []) TypeError: can only concatenate list (not "dict") to list

is this the expected behaviour? I modifed line 113 in vqa.py to return the image_id as below but not sure if it will break something else.

anns = [self.qa[quesId] for quesId in quesIds if quesId in self.qa]

Any other ways to get result accuracy on test205 since the server is closed ?

Hi,@AishwaryaAgrawal
I want to get accuracy for test2015, but the Challenge page told me the challenge is ended. But it worked yesterday.
Is there any other ways for me to get the accuracy?

Post-processing in VQA 2.0 Evaluation

Hello, I am building a VQA system and am seeking clarification for a condition in the vqaEval.py script.

if len(set(gtAnswers)) > 1:
    for ansDic in gts[quesId]['answers']:
        ansDic['answer'] = self.processPunctuation(ansDic['answer'])
        ansDic['answer'] = self.processDigitArticle(ansDic['answer'])
    resAns = self.processPunctuation(resAns)
    resAns = self.processDigitArticle(resAns)

The above condition is placed before running standard post-processing on predictions in line 98. My understanding is that this translates to: 'if all the human annotators agree, don't do any post-processing. However, my system is producing some variation in outputs in these cases, such as 'yes!' rather than 'yes'. Of course I can do my own post-processing, but I was wondering if you might offer some insight into the rationale behind the above condition?

Many thanks!

Where can I find the test set?

When I access the website, I see no links for test answer files.

evaluation script gives wrong accuracy

While testing my solution I noticed this odd behavior. (See the picture below)

As you can see, my generated answer is 'none'.
According to the evaluation metric the correct accuracy should be 30% because there is one answer the same as mine.
I think this is happening because of the processing done before evaluation. In file vqaEval.py line 42, the answer 'none' is replaced with '0' . Because there is not any '0' in ground truth answers, the accuracy is set to 0.00%. If I remove 'none': '0', from manualMap dictionary I get the right accuracy for this question (30%).

If it helps, the id of this question is 411188011, and the name of the picture is COCO_val2014_000000411188.jpg

Can you look more into it? I hope I didn't miss anything.

v2_mscoco_train2014_annotations.json

How was v2_mscoco_train2014_annotations.json generated? I hope to hear from you.

Strange evaluation

In vqaEval.py, lines 97-104, the code that computes the accuracy for a generated answer seems to produce strange values. For example, if a question has 8 "yes" answers and 2 "no" answers (provided by the workers), the accuracy of a generated answer would be 0.533 for "no" and 0.2 for "yes".

accuracy for "yes": 2/10 * min(1, 8/3)
accuracy for "no": 8/10 * min(1, 2/3)

Can you please explain the reasons for that specific evaluation scheme?