Large language models (LLMs) have shown promising results in medical decision support; Background: Large language models (LLMs) have demonstrated promising outcomes in medical decision support; however, their efficacy in managing complex hepatobiliary conditions remains insufficiently examined. We have developed a genetic neuro-symbolic LLM system that integrates multiple AI agents with neural-symbolic reasoning for the management of cholangitis, and we have compared its performance to that of conventional LLMs and human experts.genetic neuro-symbolic LLM system integrating multiple AI agents with neural-symbolic reasoning for cholangitis management and compared its performance against conventional LLMs and human experts. This multi-center cross-sectional study included 30 case-based questions from American Board of Internal Medicine (ABIM) gastroenterology subspecialty examinations covering acute cholangitis. Questions were categorized into diagnosis (n = 10), treatment (n = 10), and complications/prognosis (n = 10). Performance of a genetic neuro-symbolic LLM system orchestrated via LangGraph was compared against Claude 4.5 Sonnet, ChatGPT 5.2, Gemini 2.0 Flash, 10 gastroenterology specialists, and 4 emergency medicine physicians from four tertiary centers in Turkey. The genetic neuro-symbolic system achieved the highest overall accuracy (100%, 30/30), significantly outperforming Claude 4.5 Sonnet (90.0%), ChatGPT 5.2 (60.0%), Gemini 2.0 Flash (63.3%), gastroenterology experts (mean 95.7% ± 3.2%), and emergency medicine physicians (mean 84.2% ± 8.8%). The neuro-symbolic system demonstrated superior performance across all categories and cholangitis subtypes. Among human participants, gastroenterologists outperformed emergency physicians in treatment decisions (p = 0.012) and showed non-inferior performance to Gemini 2.0 Flash overall (p = 0.034). The genetic neuro-symbolic LLM system demonstrated superior accuracy in cholangitis management compared to all conventional AI models and human experts. This proof-of-concept study suggests that multi-agent architectures with neural-symbolic reasoning may offer a promising direction for AI-assisted clinical decision support in complex hepatobiliary conditions, although prospective clinical validation is required before broader implementation claims can be warranted.
Ucdal et al. (Mon,) studied this question.