December 1, 2015

VQA: ビジュアル質問応答

Key Points

Key points are not available for this paper at this time.

Abstract

自由形式でオープンエンドなビジュアル質問応答（VQA）のタスクを提案します。画像とその画像に関する自然言語の質問が与えられた場合、正確な自然言語の回答を提供することがタスクです。視覚障害者の支援など実世界のシナリオを反映して、質問と回答の両方がオープンエンドとなっています。視覚的な質問は、背景の詳細や基礎的な文脈を含む画像の異なる部分を選択的に対象とします。その結果、VQAに成功するシステムは、一般的な画像キャプションを生成するシステムよりも詳細な画像理解と複雑な推論を必要とします。さらに、多くのオープンエンド回答が数語または複数選択形式で提供可能な限定的な回答セットを含むため、VQAは自動評価に適しています。約25万枚の画像、約76万の質問、約1000万の回答を含むデータセット（www.visualqa.org）を提供し、その情報内容について議論します。VQAの多数のベースライン手法を提供し、人間の性能と比較しています。

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Stanislaw Antol

Aishwarya Agrawal

Jiasen Lu

Actions

Institutions

Georgia Institute of Technology

Virginia Tech

Microsoft (United States)

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

VQA: ビジュアル質問応答

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider