The accuracy and consistency of artificial intelligence (AI) based chatbots and their dependability in the field of dental education are questionable. This study was aimed to evaluate the performance of four different chatbots in answering multiple-choice questions (MCQs) in operative dentistry. Relying on textbooks in operative dentistry, a three-membered panel of experts developed 150 MCQs, which a fourth expert screened to yield a final 110 MCQs. These questions were input into GPT-4o, Grok 3, Gemini Advanced and Claude 3.7 Sonnet in two rounds with a gap of one-week interval. The proportion of correct answers reflected the performance of these chatbots. Inter- and intra-chatbot consistencies were analysed using the McNemar test and Cohen’s Kappa. In the first round, Grok 3 and Gemini Advanced answered 86.4% of the MCQs correctly, while GPT-4o and Claude 3.7 Sonnet answered 85.5% correctly. In the second round, the performance of GPT-4o and Claude 3.7 Sonnet improved, answering 87.3% and 91.8%, respectively. Intra-chatbot consistency ranged from fair (Kappa = 0.33) for Claude 3.7 Sonnet to substantial for GPT-4o. Inter-chatbot consistency ranged from 0.34 to 0.54 in the first round and 0.44 to 0.66 in the second round. The assessed chatbots showed promising performance in answering MCQs in operative dentistry and improved over time. The assessed chatbots can be used as adjuncts in the education process of operative dentistry while carefully considering their inherent limitations. Determining the accuracy and, consequently, the dependability of the most widely used AI-based chatbots in responding to dental queries is essential for dental students. Dental students must interpret chatbots’ responses with caution and use them as supplementary tools alongside the standard resources such as textbooks and guidance from mentors.
Building similarity graph...
Analyzing shared references across papers
Loading...
Thilla Sekar Vinothkumar
Syed Nahid Basheer
Sabari Murugesan
BMC Oral Health
Primary Health Care
Jazan University
Building similarity graph...
Analyzing shared references across papers
Loading...
Vinothkumar et al. (Tue,) studied this question.
www.synapsesocial.com/papers/69d8948f6c1944d70ce05729 — DOI: https://doi.org/10.1186/s12903-026-08276-9