What question did this study set out to answer?

The study aims to evaluate the effectiveness of three large language models in answering patient questions about HCC treatment.

April 10, 2026

Comparative Evaluation of Large Language Models for Patient Education in Interventional Oncology

Key Points

The study aims to evaluate the effectiveness of three large language models in answering patient questions about HCC treatment.
Developed a standardized set of ten questions on HCC treatment effects.
Prompted three LLMs to generate responses to the questions.
Two interventional radiologists evaluated the models' responses on a numerical scale.
Analyzed model comparisons using one-way ANOVA for statistical significance.
LLMs provided generally accurate and readable responses.
No statistically significant differences were found among the models.
Qualitative analysis showed inconsistencies in addressing specific treatment techniques.

Abstract

Abstract To evaluate responses from three publicly available Large Language Models (LLMs) of common patient questions regarding the treatment of hepatocellular carcinoma (HCC) by interventional radiology, focusing on embolization and ablation. A standardized set of ten questions addressing procedure indications, risks, benefits, and outcomes was developed by the research team. Three LLMs—ChatGPT 4o Mini (OpenAI), Gemini (Google), and Copilot (Microsoft)—were prompted to generate responses to the questions. Two attending interventional radiologists independently evaluated responses using a web-based survey instrument, assessing response accuracy, comprehensiveness, readability, compassion, and overall quality on a numerical scale from 0 to 100. Comparisons between models in each domain were made using one-way ANOVA, and the survey provided opportunities for qualitative comments. LLMs were found to provide readable, generally accurate responses with no statistically significant difference within any of the evaluated domains (p > 0.05). Qualitative analysis revealed inconsistencies in LLM responses for addressing procedure subtypes, techniques, and clinical nuances of HCC treatment. While LLMs show promise as an adjunct tool for preprocedural patient education, current limitations highlight the necessity for professional oversight. Future studies incorporating patient feedback are essential to assess their impact on comprehension and satisfaction.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Harrison Blume

DE Williams

Arvind Dev

Journals

Digestive Disease Interventions

Actions

Institutions

University of California, Los Angeles

Albert Einstein College of Medicine

Montefiore Medical Center

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Comparative Evaluation of Large Language Models for Patient Education in Interventional Oncology

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study