March 3, 2026Open Access

Web Retrieval-Aware Chunking (W-Rac) for Efficient and Cost-Effective Retrieval-Augmented Generation Systems

Key Points

W-RAC achieves improved retrieval performance, reducing operational costs significantly compared to traditional chunking methods.
By leveraging structured, ID-addressable units, the method lowers token consumption and minimizes text redundancy.
Experimental analysis confirms that W-RAC outperforms standard approaches, enhancing system observability and reducing hallucination risks.
The novel chunking framework adapts to large-scale web content ingestion, making it suitable for diverse applications.

Abstract

Retrieval-Augmented Generation (RAG) systems critically depend on effective document chunking strategies to balance retrieval quality, latency, and operational cost. Traditional chunking approaches, such as fixed-size, rule-based, or fully agentic chunking, often suffer from high token consumption, redundant text generation, limited scalability, and poor debuggability, especially for large-scale web content ingestion. In this paper, we propose Web Retrieval-Aware Chunking (W-RAC), a novel, cost-efficient chunking framework designed specifically for web-based documents. W-RAC decouples text extraction from semantic chunk planning by representing parsed web content as structured, ID-addressable units and leveraging large language models (LLMs) only for retrieval-aware grouping decisions rather than text generation. This significantly reduces token usage, eliminates hallucination risks, and improves system observability.Experimental analysis and architectural comparison demonstrate that W-RAC achieves comparable or better retrieval performance than traditional chunking approaches while reducing chunking-related LLM costs by an order of magnitude.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Uday Allu

Sonu Kedia

Tanmay Odapally

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Web Retrieval-Aware Chunking (W-Rac) for Efficient and Cost-Effective Retrieval-Augmented Generation Systems

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study