Objective: The study aims to analyze and compare the performance of 4 leading large language models (LLMs) in answering questions related to oral and maxillofacial surgery, as posed in the Turkish Dentistry Specialization Education Entrance Exam. Material and Methods: A total of 123 oral and maxillofacial surgery questions, without figures or graphs, published between 2012-2021, were analyzed. The study evaluated the performance of ChatGPT 4, ChatGPT 3.5, Gemini 1.5, and Copilot. The correct answer rates of LLMs were compared according to the years in which the questions were asked and oral and maxillofacial surgery topics. Results: In the study, the highest correct response rate was obtained with ChatGPT-4 (91.06%), followed by Copilot (86.99%), ChatGPT 3.5 (82.11%), and Gemini 1.5 (79.67%). However, no statistically significant difference was observed regarding correct response rates among the 4 LLMs examined in the study (p=0.059). All LLMs correctly answered 66.66% of orofacial infection questions, 80% of orthognathic surgery questions, and 100% of orofacial pain questions. ChatGPT 4 and Copilot answered 100% of dental implantology questions correctly. Conclusion: The LLMs examined in the study exhibited acceptable correct response rates (79.67% to 91.06%), and their performances were similar to each other. The results of the study demonstrate the possible of LLMs to be used as educational support instruments in oral and maxillofacial surgery education.
Building similarity graph...
Analyzing shared references across papers
Loading...
Ömer Ekici
Turkiye Klinikleri Journal of Dental Sciences
Building similarity graph...
Analyzing shared references across papers
Loading...
Ömer Ekici (Thu,) studied this question.
www.synapsesocial.com/papers/69a7673bbadf0bb9e87e01bf — DOI: https://doi.org/10.5336/dentalsci.2025-110603