What question did this study set out to answer?

The study aims to assess the feasibility and performance of a multi-agent system in supporting early sepsis management in intensive care units.

May 9, 2026Open Access

Multi-Agent System for Early Sepsis Management Support: A Follow-up Evaluation Study

Key Points

The study aims to assess the feasibility and performance of a multi-agent system in supporting early sepsis management in intensive care units.
Evaluated a multi-agent system with sepsis management, antibiotic recommendation, and guideline compliance agents.
Powered by Palmyra-Med 70B, compared to GPT-3.5 Turbo and GPT-4o mini, using retrieval-augmented generation.
Performance assessed with expert evaluations and quantitative metrics like Cohen kappa and groundness.
Guideline-compliant recommendations were generated, including treatment protocols for necrotizing fasciitis.
Hallucinations reported in 3 out of 10 cases, indicating potential inaccuracies.
Expert agreement measured at a Cohen kappa of 0.26, suggesting low correlation between assessments.

Abstract

Objectives: This study evaluated the feasibility and performance of a multi-agent (MA) system designed to support early sepsis management in intensive care units. The system integrates three specialized agents—sepsis management, antibiotic recommendation, and guideline compliance—to provide evidence-based recommendations at T = 0 hours (before culture results), extending prior single-case findings across 10 diverse cases.Methods: The MA system was powered by Palmyra- Med 70B (selected for superior MedQA performance average score, 85.9) and compared with GPT-3.5 Turbo and GPT-4o mini (all at a temperature of 0.25). It used retrieval-augmented generation (RAG) with ChromaDB (2021 Surviving Sepsis Campaign, over 20 high-impact manuscripts reviews published 2018–2025 on sepsis etiologies, and other relevant sources). Eight cases from the MIMIC-IV demo and two cases from the literature were formatted as vignettes. RAG used the BAAI/bge-base-en-v1.5 embedding model with cosine similarity (threshold, 0.75) and top-5 chunks. Performance was assessed via TruLens (groundedness, approximately 0.62) and by two intensivists using a standardized questionnaire.Results: The system generated guideline-compliant recommendations (e.g., prompt surgical debridement plus meropenem and vancomycin for necrotizing fasciitis). Hallucinations occurred in three of 10 cases (e.g., “altered mental status”). Expert agreement was quantified by a Cohen kappa of 0.26. Programmatic and expert assessments showed negligible correlation.Conclusions: In this exploratory study, the MA system shows preliminary promise for early sepsis support but requires human oversight to mitigate hallucinations. Code is available in GitHub; further validation is needed.

Bookmark

View Full Paper

Bookmark

View Full Paper

Multi-Agent System for Early Sepsis Management Support: A Follow-up Evaluation Study

Key Points

Abstract

Cite This Study