December 1, 2015

VQA：视觉问答

Key Points

Key points are not available for this paper at this time.

Abstract

我们提出了自由形式和开放式视觉问答（VQA）任务。给定一张图片和关于该图片的自然语言问题，任务是提供准确的自然语言答案。模拟现实场景，如帮助视障人士，问题和答案均为开放式。视觉问题有选择性地针对图片的不同区域，包括背景细节和潜在语境。因此，成功完成VQA的系统通常需要比生成通用图片标题的系统更详细地理解图像并进行复杂推理。此外，VQA易于自动评估，因为许多开放式答案仅包含少量词汇或可在多项选择格式中提供的封闭答案集。我们提供了一个包含约25万张图片、约76万个问题和约1000万个答案的数据集（www.visualqa.org），并讨论了其所提供的信息。提供了众多VQA基线方法，并与人类表现进行了比较。

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Stanislaw Antol

Aishwarya Agrawal

Jiasen Lu

Actions

Institutions

Georgia Institute of Technology

Virginia Tech

Microsoft (United States)

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

VQA：视觉问答

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider