August 2, 2019Open Access

Natural Questions: A Benchmark for Question Answering Research

Key Points

Key points are not available for this paper at this time.

Abstract

We present the Natural Questions corpus, a question answering data set. Questions consist of real anonymized, aggregated queries issued to the Google search engine. An annotator is presented with a question along with a Wikipedia page from the top 5 search results, and annotates a long answer (typically a paragraph) and a short answer (one or more entities) if present on the page, or marks null if no long/short answer is present. The public release consists of 307,373 training examples with single annotations; 7,830 examples with 5-way annotations for development data; and a further 7,842 examples with 5-way annotated sequestered as test data. We present experiments validating quality of the data. We also describe analysis of 25-way annotations on 302 examples, giving insights into human variability on the annotation task. We introduce robust metrics for the purposes of evaluating question answering systems; demonstrate high human upper bounds on these metrics; and establish baseline results using competitive methods drawn from related literature.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Kwiatkowski et al. (Fri,) studied this question.

www.synapsesocial.com/papers/69d83cd48c03fbaff8bee661 — DOI: https://doi.org/10.1162/tacl_a_00276

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

DuReader: a Chinese Machine Reading Comprehension Dataset from Real-world Applications· 2018 · 231 citations
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)· 2018 · 516 citations
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing· 2013 · 1,266 citations
The Goldilocks Principle: Reading Children's Books with Explicit Memory Representations· 2016 · 307 citations

Authors

Tom Kwiatkowski

Jennimaria Palomaki

Olivia Redfield

Journals

SHILAP Revista de lepidopterología

Transactions of the Association for Computational Linguistics

Actions

Institutions

Google (United States)

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Natural Questions: A Benchmark for Question Answering Research

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion