March 3, 2026Open Access

Assessing the Efficacy of Artificial Intelligence Platforms in Answering Dental Caries Multiple-Choice Questions: A Comparative Study of ChatGPT and Google Gemini Language Models

Puntos clave

Gemini significantly outperformed ChatGPT across all examination formats, enhancing accuracy in dental caries questions.
Gemini achieved higher passing rates and average scores in every question count examined, identifying it as more reliable.
Statistical analyses indicated strong relationships between LLM type and question count, influencing performance outcomes.
Educators should be cautious in relying on AI for assessments, as both models struggled with complex dental caries content.

Resumen

Objective: This study aimed to compare the accuracy of two large language models (LLMs)-ChatGPT (version 3.5) and Google Gemini (formerly Bard)-in answering dental caries-related multiple-choice questions (MCQs) using a simulated student examination framework across seven examination lengths. Materials and Methods: A total of 125 validated dental caries MCQs were extracted from Dental Decks and Oxford University Press question banks. Seven examination groups were constructed with varying question counts (25, 35, 45, 55, 65, 75, and 85 questions). For each group, 100 simulations were generated per LLM (ChatGPT and Gemini), resulting in 1400 simulated examinations. Each simulated student received a unique randomized subset of questions. MCQs were answered by each LLM using a standardized prompt to minimize ambiguity. Outcomes included mean score, passing rate (≥60%), and performance differences between LLMs. Statistical analyses included independent t-tests, one-way ANOVA within each LLM, and two-way ANOVA examining interactions between LLM type and question count. Results: Across all seven examination formats, Gemini significantly outperformed ChatGPT (p p Conclusions: Gemini demonstrated superior accuracy and higher passing rates compared to ChatGPT in all simulated examination formats. While both LLMs struggled with complex caries-related content, Gemini provided more reliable performance across question quantities. Educators should exercise caution in relying on LLMs for automated assessment or self-study, and future research should evaluate human-AI hybrid models and LLM performance across broader dental domains.

Me gusta

Guardar

Ver artículo completo