This study investigates the feasibility and reliability of Retrieval-Augmented Generation (RAG) in medical question-answering tasks. To address issues such as hallucination and lack of traceability in large language models (LLMs) in medical contexts, a medical knowledge base was constructed using the cancer-related subset of a Kaggle healthcare dataset. Experiments were conducted with the Qwen-Plus and Qwen-Flash models. Evaluation was carried out across three dimensions: answer accuracy, source traceability, and refusal capability, with additional analysis on the impact of different retrieval quantities (top-k) on performance. The results show that RAG significantly improves the semantic consistency of model responses, outperforming the baseline model on the BERTScore F1 metric. It also demonstrates strong performance in terms of refusal rate and attribution accuracy, highlighting its advantages in mitigating hallucinations and enhancing interpretability. Furthermore, the findings indicate that a retrieval quantity of k=5 yields the best overall performance. This study validates the potential of RAG in medical question answering and provides empirical support for building trustworthy medical AI systems.
Building similarity graph...
Analyzing shared references across papers
Loading...
Chengwei Cao (Mon,) studied this question.
www.synapsesocial.com/papers/69df2c62e4eeef8a2a6b1834 — DOI: https://doi.org/10.1051/itmconf/20268401001/pdf
Chengwei Cao
Building similarity graph...
Analyzing shared references across papers
Loading...