VISUAL QA

The VQA dataset

VQA is a new dataset containing open-ended questions about images. These questions require an understanding of vision, language and commonsense knowledge to answer.

Dataset Statistics

265,016 images (COCO and abstract scenes) 1,105,904 questions 11,059,040 ground truth answers

At least 3 questions (5.4 questions on average) per image 10 ground truth answers per question 3 plausible (but likely incorrect) answers per question Automatic evaluation metric

References

Samples

VQA Sample

Dataset Usage

Download dataset

http://www.visualqa.org/download.html

Requirements

python 2.7 scikit-image (visit this page for installation) matplotlib (visit this page for installation)

Files

./Questions

./Annotations

./Images

./PythonHelperTools

./PythonEvaluationTools

./Results

./QuestionTypes