Retrieval-Augmented Generation (RAG) systems integrate large language models with information retrieval to ground responses in factual data. This study systematically evaluates the contribution of each RAG component in a medical question answering system through comprehensive ablation analysis. We designed a hierarchical RAG architecture with six key components: hierarchical intent classification, query rewriting, two-stage retrieval (dense retrieval with FAISS + cross-encoder reranking using Clinical-Longformer), and specialist routing. We conducted systematic ablation studies across seven configurations on 476 medical questions from MedQA benchmarks. Each configuration was evaluated independently using GPT-4o mini as an LLM judge across four metrics: context relevance, completeness, faithfulness, and correctness (1-5 Likert scale), with each metric assessed through separate evaluation calls to minimize inter-metric bias. Statistical significance was validated through paired t-tests with effect size calculations (Cohen’s d). The full system achieved an overall score of 3.64/5.0. Systematic ablation revealed two critical components: reranking (removal: -0.24 overall, P
Building similarity graph...
Analyzing shared references across papers
Loading...
Emekci et al. (Sun,) studied this question.
www.synapsesocial.com/papers/69b8f11edeb47d591b8c5ff9 — DOI: https://doi.org/10.34248/bsengineering.1849342
Hakan Emekci
Daniel Quillan Roxas
Black Sea Journal of Engineering and Science
TED University
Building similarity graph...
Analyzing shared references across papers
Loading...