Abstract Large Language Models (LLMs) are increasingly used to support cancer patients and clinicians in decision-making. This systematic review investigates how LLMs are integrated into oncology and evaluated by researchers. We conducted a comprehensive search across PubMed, Web of Science, Scopus, and the ACM Digital Library through May 2024, identifying 56 studies covering 15 cancer types. The meta-analysis results suggested that LLMs were commonly used to summarize, translate, and communicate clinical information, but performance varied: the average overall accuracy was 76.2%, with average diagnostic accuracy lower at 67.4%, revealing gaps in the clinical readiness of this technology. Most evaluations relied heavily on quantitative datasets and automated methods without human graders, emphasizing “accuracy” and “appropriateness” while rarely addressing “safety”, “harm”, or “clarity”. Current limitations for LLMs in cancer decision-making, such as limited domain knowledge and dependence on human oversight, demonstrate the need for open datasets and standardized evaluations to improve reliability.
Building similarity graph...
Analyzing shared references across papers
Loading...
Yuexing Hao
Zhiwen Qiu
Jason Holmes
npj Digital Medicine
Massachusetts Institute of Technology
Cornell University
Mayo Clinic in Arizona
Building similarity graph...
Analyzing shared references across papers
Loading...
Hao et al. (Thu,) studied this question.
www.synapsesocial.com/papers/689a02c9e6551bb0af8cceb4 — DOI: https://doi.org/10.1038/s41746-025-01824-7
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: