November 8, 2025Open Access

Efficient Dynamic Clustering-Based Document Compression for Retrieval-Augmented-Generation

Puntos clave

Document compression enhances robustness in retrieval-augmented generation applications, improving output quality.
Experimental results show performance improvements, achieving a significant boost across various experimental settings.
Assessment using dynamic clustering methods enables effective removal of noise and redundancy content in document retrieval.
This method may enable enhanced capabilities in knowledge injection for large language models during inference.

Resumen

Retrieval-Augmented Generation (RAG) has emerged as a widely adopted approach for knowledge injection during large language model (LLM) inference in recent years. However, due to their limited ability to exploit fine-grained inter-document relationships, current RAG implementations face challenges in effectively addressing the retrieved noise and redundancy content, which may cause error in the generation results. To address these limitations, we propose an Efficient Dynamic Clustering-based document Compression framework (EDC2-RAG) that utilizes latent inter-document relationships while simultaneously removing irrelevant information and redundant content. We validate our approach, built upon GPT-3.5-Turbo and GPT-4o-mini, on widely used knowledge-QA and Hallucination-Detection datasets. Experimental results show that our method achieves consistent performance improvements across various scenarios and experimental settings, demonstrating strong robustness and applicability. Our code and datasets are available at https://github.com/Tsinghua-dhy/EDC-2-RAG.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Li et al. (Fri,) studied this question.

www.synapsesocial.com/papers/690e8b6ca5b062d7a4e73392 — DOI: https://doi.org/10.48550/arxiv.2504.03165

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Authors

Weitao Li

Kaiming Liu

Xiangyu Zhang

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Efficient Dynamic Clustering-Based Document Compression for Retrieval-Augmented-Generation

Puntos clave

Resumen

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion