Key points are not available for this paper at this time.
Abstract Background Large language models (LLMs) offer exciting potential to augment clinical decision-making, but their role in supporting thyroid cancer multidisciplinary tumour board meetings (MDTs) remains uncertain. This study evaluated the performance of three LLMs: ChatGPT, MetaAI, and DeepSeek in reproducing the management decisions of a regional thyroid cancer MDT. Methods 100 thyroid cancer cases were reviewed by a regional MDT comprising of consultant endocrine surgeons, a consultant radiologist, and a consultant histopathologist. MDT outcomes served as the reference standard. Each case was then submitted identically to the three LLMs using a standardised prompt and referencing up-to-date ATA, BTA and UICC guidelines. A 4-point concordance scale was applied: 3 = full agreement; 2 = acceptable alternative; 1 = third-line approach; 0 = discordant/overtreatment. Results ChatGPT achieved the highest number of fully concordant outputs (75×3, 19×2, 2×1, 4×0), with 94% of its responses scoring 2 or 3. DeepSeek produced 64×3, 28×2, 3×1, 5×0 (92% scores 2 or 3). MetaAI produced 61×3, 28×2, 4×1, 7×0 (89% scores 2 or 3). Despite identical prompts and guideline references, the models varied in their outcomes. ChatGPT demonstrated the strongest overall concordance; DeepSeek and MetaAI showed similar strong performances with slightly higher discordance rates. Conclusion All three LLMs demonstrated high concordance with consultant-led MDT decisions in thyroid cancer management. While none can yet be considered reliable for autonomous clinical use, these findings highlight the promising future role of LLMs in MDT decision-support and workflow streamlining once governance, validation, and regulatory frameworks mature.
Building similarity graph...
Analyzing shared references across papers
Loading...
A White
D Scott-Coombes
N Patel
British journal of surgery
Cardiff and Vale University Health Board
Morriston Hospital
Swansea Bay University Health Board
Building similarity graph...
Analyzing shared references across papers
Loading...
White et al. (Fri,) studied this question.
www.synapsesocial.com/papers/6a0d4fbff03e14405aa9b341 — DOI: https://doi.org/10.1093/bjs/znag045.001