This study aimed to evaluate the intra-model repeatability of three artificial intelligence-based chatbots (ChatGPT-4.0, Microsoft Copilot, and Claude 3.5) in composite shade selection and their agreement with a dental specialist. Ten acrylic resin maxillary central incisor teeth representing different VITA Classical shades (n = 10) were photographed together with A1, A2, and A3 composite shade tabs under standardized illumination. Shade selections were performed by each artificial intelligence model based on the photographs and repeated on five different days using identical images and prompts. Visual shade selection by the dental specialist was determined by consensus between two calibrated evaluators. CIE L*, a*, and b* values of the acrylic teeth and composite shade tabs were obtained by photometric analysis, and color differences were calculated using the CIEDE2000 formula. Intra-model repeatability was assessed using Fleiss’ kappa coefficient, and agreement with the dental specialist was evaluated using Cohen’s kappa statistic. Intra-model repeatability differed among the models, with ChatGPT-4.0 demonstrating fair repeatability (κ = 0.33), Claude 3.5 showing moderate repeatability (κ = 0.45), and Microsoft Copilot exhibiting poor repeatability (κ = −0.12). Trial-level agreement with the dental specialist varied across repeated assessments, with ChatGPT-4.0 generally demonstrating higher agreement than the other models, whereas Microsoft Copilot showed consistently low agreement. Artificial intelligence chatbots showed variable repeatability and limited agreement with expert evaluation in composite shade selection under standardized conditions.
Özdemir et al. (Fri,) studied this question.