May 20, 2026

Assessing the performance of three large language models in thyroid cancer tumour board decision-making

Key Points

Key points are not available for this paper at this time.

Abstract

Abstract Background Large language models (LLMs) offer exciting potential to augment clinical decision-making, but their role in supporting thyroid cancer multidisciplinary tumour board meetings (MDTs) remains uncertain. This study evaluated the performance of three LLMs: ChatGPT, MetaAI, and DeepSeek in reproducing the management decisions of a regional thyroid cancer MDT. Methods 100 thyroid cancer cases were reviewed by a regional MDT comprising of consultant endocrine surgeons, a consultant radiologist, and a consultant histopathologist. MDT outcomes served as the reference standard. Each case was then submitted identically to the three LLMs using a standardised prompt and referencing up-to-date ATA, BTA and UICC guidelines. A 4-point concordance scale was applied: 3 = full agreement; 2 = acceptable alternative; 1 = third-line approach; 0 = discordant/overtreatment. Results ChatGPT achieved the highest number of fully concordant outputs (75×3, 19×2, 2×1, 4×0), with 94% of its responses scoring 2 or 3. DeepSeek produced 64×3, 28×2, 3×1, 5×0 (92% scores 2 or 3). MetaAI produced 61×3, 28×2, 4×1, 7×0 (89% scores 2 or 3). Despite identical prompts and guideline references, the models varied in their outcomes. ChatGPT demonstrated the strongest overall concordance; DeepSeek and MetaAI showed similar strong performances with slightly higher discordance rates. Conclusion All three LLMs demonstrated high concordance with consultant-led MDT decisions in thyroid cancer management. While none can yet be considered reliable for autonomous clinical use, these findings highlight the promising future role of LLMs in MDT decision-support and workflow streamlining once governance, validation, and regulatory frameworks mature.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

A White

D Scott-Coombes

N Patel

Journals

British journal of surgery

Actions

Institutions

Cardiff and Vale University Health Board

Morriston Hospital

Swansea Bay University Health Board

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Assessing the performance of three large language models in thyroid cancer tumour board decision-making

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study