March 3, 2026Open Access

Performance of ChatGPT 4, ChatGPT 3.5, Gemini 1.5, and Copilot in Solving Oral and Maxillofacial Surgery Questions Asked in the Turkish Dentistry Specialization Education Entrance Exam: Comparison Study

Key Points

ChatGPT 4 achieved the highest correct response rate of 91.06%, indicating strong performance in answering surgery-related questions.
Copilot and ChatGPT 4 both answered 100% of dental implantology questions correctly, showcasing their precision.
Analysis involved 123 questions from 2012-2021, focused on oral and maxillofacial surgery topics, ensuring coverage of relevant content.
No significant differences were found in performance among the models, highlighting their comparable capabilities in educational contexts.

Abstract

Objective: The study aims to analyze and compare the performance of 4 leading large language models (LLMs) in answering questions related to oral and maxillofacial surgery, as posed in the Turkish Dentistry Specialization Education Entrance Exam. Material and Methods: A total of 123 oral and maxillofacial surgery questions, without figures or graphs, published between 2012-2021, were analyzed. The study evaluated the performance of ChatGPT 4, ChatGPT 3.5, Gemini 1.5, and Copilot. The correct answer rates of LLMs were compared according to the years in which the questions were asked and oral and maxillofacial surgery topics. Results: In the study, the highest correct response rate was obtained with ChatGPT-4 (91.06%), followed by Copilot (86.99%), ChatGPT 3.5 (82.11%), and Gemini 1.5 (79.67%). However, no statistically significant difference was observed regarding correct response rates among the 4 LLMs examined in the study (p=0.059). All LLMs correctly answered 66.66% of orofacial infection questions, 80% of orthognathic surgery questions, and 100% of orofacial pain questions. ChatGPT 4 and Copilot answered 100% of dental implantology questions correctly. Conclusion: The LLMs examined in the study exhibited acceptable correct response rates (79.67% to 91.06%), and their performances were similar to each other. The results of the study demonstrate the possible of LLMs to be used as educational support instruments in oral and maxillofacial surgery education.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Ömer Ekici

Journals

Turkiye Klinikleri Journal of Dental Sciences

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Performance of ChatGPT 4, ChatGPT 3.5, Gemini 1.5, and Copilot in Solving Oral and Maxillofacial Surgery Questions Asked in the Turkish Dentistry Specialization Education Entrance Exam: Comparison Study

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study