This study aims to evaluate and compare the performance of patient education materials generated by five widely used chatbots, ChatGPT-5.2, ChatGPT-5.2 Plus, Gemini 3.0, Gemini 3.0 Plus and DeepSeek V3.2, on answering questions related to broken endodontic instruments in root canals. Twenty-two questions were formulated by two endodontists, each with eight years of experience in instrument removal procedures, based on their clinical expertise and educational materials from the American Association of Endodontists (AAE). The questions were posed to the chatbots over a period of five days, at three different times each day (morning, afternoon, and evening). Two blinded evaluators independently assessed responses for accuracy using a 1–5 scale. Disagreements on scoring were resolved through evidence-based discussions. Coefficient of variation (CV) was calculated to evaluate the consistency of repeated responses for each chatbot. Readability was evaluated using the Flesch Kincaid Reading Ease Score, Flesch Kincaid Grade Level, Gunning Fog Score, and SMOG Indices. Significant differences in accuracy were found among the chatbots (p < 0.05), with ChatGPT-5.2 demonstrating lower accuracy than the other models (p < 0.001). Accuracy was higher on day 2 than on the other days (p < 0.001). Consistency scores differed significantly among models (p < 0.05), with Gemini 3 Plus and DeepSeek V3.2 showing higher consistency than ChatGPT-5.2. Readability analysis indicated that ChatGPT-5.2 Plus generated more readable responses, whereas Gemini and DeepSeek V3.2 required higher reading grade levels. Large language models (LLMs)-based chatbots showed model-dependent differences in accuracy, consistency, and readability. While Gemini 3, Gemini 3 Plus and DeepSeek V3.2 performed better in terms of accuracy and consistency, ChatGPT-5.2 and ChatGPT-5.2 Plus provided more readable content, highlighting the need for cautious and selective use of these tools in patient education.
Building similarity graph...
Analyzing shared references across papers
Loading...
Sümbüllü et al. (Mon,) studied this question.
www.synapsesocial.com/papers/69df2ae6e4eeef8a2a6afe70 — DOI: https://doi.org/10.1186/s12903-026-08327-1
Meltem Sümbüllü
Elham Othman Adam
EMİNE ARAZ ALTUN
BMC Oral Health
Atatürk University
Istanbul Medeniyet University
Building similarity graph...
Analyzing shared references across papers
Loading...