Background Public interest in the use of anesthesia during circumcision has increased, yet the reliability of freely available artificial intelligence (AI) chatbots in addressing such medical questions remains unclear. This study aimed to comparatively assess the accuracy, safety, and citation reliability of three widely used AI chatbots—ChatGPT, Gemini, and DeepSeek—when responding to common public queries related to circumcision anesthesia. Methods Five high-interest questions were derived from global Google Trends data and submitted to each chatbot using two different input formats: unstructured lay-language queries and structured prompts explicitly based on current clinical guidelines. All generated responses were independently reviewed by a urologist and an anesthesiologist and scored for guideline concordance, citation accuracy, and the presence of potentially harmful information. Results Across both query formats, DeepSeek produced responses that were more closely aligned with established guidelines compared with ChatGPT and Gemini (P0.05). Under structured prompting, DeepSeek also demonstrated higher citation accuracy than ChatGPT (P=0.049). Importantly, none of the evaluated responses contained advice deemed unsafe or clinically harmful. The use of structured, guideline-oriented prompts was associated with a consistent improvement in response quality across all evaluated AI platforms. Conclusion Freely accessible AI chatbots show heterogeneous performance in providing information on circumcision anesthesia. Although these systems may offer supplementary educational value, their outputs vary in reliability and should be interpreted with caution. Expert clinical oversight remains essential to ensure patient safety and adherence to evidence-based guidelines.
Şahin et al. (Thu,) studied this question.