What question did this study set out to answer?

This study aims to assess the accuracy and safety of AI chatbots in providing information about circumcision anesthesia.

March 28, 2026Open Access

Guideline Concordance and Safety of AI Chatbots for Circumcision Anesthesia: A Comparative Study

Key Points

This study aims to assess the accuracy and safety of AI chatbots in providing information about circumcision anesthesia.
Identified five high-interest questions from Google Trends data.
Submitted each question to three AI chatbots using unstructured and structured prompts.
Reviewed responses independently by a urologist and anesthesiologist for accuracy and safety.
DeepSeek provided responses more aligned with guidelines compared to ChatGPT and Gemini (P<0.05).
DeepSeek demonstrated higher citation accuracy under structured prompts (P=0.049).
No responses contained unsafe or harmful advice, emphasizing the importance of clinical oversight.

Abstract

Background Public interest in the use of anesthesia during circumcision has increased, yet the reliability of freely available artificial intelligence (AI) chatbots in addressing such medical questions remains unclear. This study aimed to comparatively assess the accuracy, safety, and citation reliability of three widely used AI chatbots—ChatGPT, Gemini, and DeepSeek—when responding to common public queries related to circumcision anesthesia. Methods Five high-interest questions were derived from global Google Trends data and submitted to each chatbot using two different input formats: unstructured lay-language queries and structured prompts explicitly based on current clinical guidelines. All generated responses were independently reviewed by a urologist and an anesthesiologist and scored for guideline concordance, citation accuracy, and the presence of potentially harmful information. Results Across both query formats, DeepSeek produced responses that were more closely aligned with established guidelines compared with ChatGPT and Gemini (P0.05). Under structured prompting, DeepSeek also demonstrated higher citation accuracy than ChatGPT (P=0.049). Importantly, none of the evaluated responses contained advice deemed unsafe or clinically harmful. The use of structured, guideline-oriented prompts was associated with a consistent improvement in response quality across all evaluated AI platforms. Conclusion Freely accessible AI chatbots show heterogeneous performance in providing information on circumcision anesthesia. Although these systems may offer supplementary educational value, their outputs vary in reliability and should be interpreted with caution. Expert clinical oversight remains essential to ensure patient safety and adherence to evidence-based guidelines.

Bookmark

View Full Paper

Bookmark

View Full Paper

Guideline Concordance and Safety of AI Chatbots for Circumcision Anesthesia: A Comparative Study

Key Points

Abstract

Cite This Study