Background/Aim: Adjuvant treatment decisions in hormone receptor-positive (HR), HER2-negative early-stage breast cancer are frequently guided by multigene assays; however, limited access to genomic testing remains a significant challenge, particularly in resource-limited settings. This study aimed to evaluate the concordance between adjuvant treatment recommendations generated by large language models (ChatGPT-4o and ChatGPT-o3) and those of an experienced medical oncologist in HR+/HER2- early-stage breast cancer patients when genomic assay results were unavailable. Patients and Methods: Clinical and pathological data from 411 patients with HR+/HER2- early-stage breast cancer were provided to ChatGPT-4o and ChatGPT-o3. Both models generated adjuvant treatment recommendations, chemotherapy plus endocrine therapy (CT+ET) or endocrine therapy alone (ET) based on ESMO and NCCN guidelines. These recommendations were compared with those of a medical oncologist. Agreement was assessed using Fleiss's and Cohen's kappa statistics, and differences among evaluators were analyzed using Cochran's Q test. Results: Overall agreement among the clinician and the two models was substantial (κ=0.67). Moderate agreement was observed between the clinician and ChatGPT-4o (κ=0.60) and between the clinician and ChatGPT-o3 (κ=0.55). Agreement between the two language models was almost perfect (κ=0.88). ChatGPT-4o demonstrated closer alignment with clinical judgment. Conclusion: Large language models showed substantial concordance with clinician decision-making in adjuvant therapy planning for HR+/HER2- early-stage breast cancer in the absence of genomic testing. These findings suggest that such models may serve as supportive decision-making tools rather than independent decision-makers, particularly in settings with limited access to multigene assays.
Karabuğa et al. (Mon,) studied this question.