March 13, 2024Open Access

RAGGED: Towards Informed Design of Retrieval Augmented Generation Systems

Key Points

Key points are not available for this paper at this time.

Abstract

Retrieval-augmented generation (RAG) greatly benefits language models (LMs) by providing additional context for tasks such as document-based question answering (DBQA). Despite its potential, the power of RAG is highly dependent on its configuration, raising the question: What is the optimal RAG configuration? To answer this, we introduce the RAGGED framework to analyze and optimize RAG systems. On a set of representative DBQA tasks, we study two classic sparse and dense retrievers, and four top-performing LMs in encoder-decoder and decoder-only architectures. Through RAGGED, we uncover that different models suit substantially varied RAG setups. While encoder-decoder models monotonically improve with more documents, we find decoder-only models can only effectively use < 5 documents, despite often having a longer context window. RAGGED offers further insights into LMs' context utilization habits, where we find that encoder-decoder models rely more on contexts and are thus more sensitive to retrieval quality, while decoder-only models tend to rely on knowledge memorized during training.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Hsia et al. (Wed,) studied this question.

www.synapsesocial.com/papers/68e74464b6db6435876be1ba — DOI: https://doi.org/10.48550/arxiv.2403.09040

Authors

Jennifer Hsia

Afreen Shaikh

Zhiruo Wang

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

RAGGED: Towards Informed Design of Retrieval Augmented Generation Systems

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion