Background: Clinical trial enrollment in oncology remains critically low, with fewer than 5% of eligible adults participating, in large part due to the complexity and labor intensity of eligibility screening.We prospectively evaluated a neuro-symbolic, multi-agent artificial intelligence (AI) platform integrating domain-specific large language model (LLM) agents, an oncology-specific knowledge graph, a real-time recommendation engine, and human-in-the-loop review to determine whether automated extraction and reasoning can safely improve trial identification, efficiency, and equity at scale.Methods: Consecutive patients N = 3804; Eastern Cooperative Oncology Group (ECOG) 0-2 balanced for cancer type incidence with metastatic or progressive malignancies were screened across a 12-month period.A multiagent architecture-OncoAgents (LLM-based extraction and reasoning agents), OncoGraph (oncology knowledge graph), OncoRecommend (prioritization engine), and OncoSet (expert-curated corpus)-carried out automated data extraction, harmonization, and trial matching over 157 367 clinical pages (86.5 M tokens).Dual oncologists produced a gold standard of trial eligibility labels (Cohen's = 0.92).The primary unit of analysis was the patient-trial pair.Baselines included manual screening, GPT-4 zero-shot prompting, GPT-4 chain-of-thought, and frontier GPT-4o extraction/matching benchmarks.Outcomes included sensitivity, specificity, precision, F1 score, calibration of eligibility confidence scores, time-to-recommendation, fairness across demographic subgroups, and operational burden.Results: The multi-agent neuro-symbolic system achieved an F1 score of 0.82 (95% confidence interval 0.81-0.83).In comparison, the GPT-4 zero-shot baseline achieved an F1 of 0.47, and the GPT-4 chain-of-thought baseline achieved an F1 of 0.67.Per-patient screening time decreased from a median of 120 min (manual review) to 30 min total (15 min automated processing + 15 min clinical review).Across the cohort, the system processed 157 000 pages, screened 23 912 candidate patient-trial pairs, and produced 17 912 oncologist-confirmed matches, with median time-torecommendation <7 days.No demographic subgroup exceeded a 10-percentage point F1 gap; the largest observed difference was 7 points between white and black/African American patients.Ablation experiments showed that both knowledge graph grounding and multi-agent decomposition contributed materially to performance and efficiency.Eligibility confidence scores exhibited reasonable calibration in the clinically relevant operating range.Conclusions: A neuro-symbolic, multi-agent architecture that couples LLM-based extraction with ontology-grounded, deterministic eligibility reasoning improved the accuracy, throughput, and timeliness of oncology clinical trial matching versus LLM-only baselines, while preserving clinician oversight and maintaining modest subgroup performance gaps.These results support scalable, equity-aware deployment of AI-assisted trial screening in routine oncology practice.
Building similarity graph...
Analyzing shared references across papers
Loading...
A. Loaiza-Bonilla
C. Yost
S. Kurnaz
ESMO Real World Data and Digital Oncology
Creighton University
Lynn University
Phoenix (United States)
Building similarity graph...
Analyzing shared references across papers
Loading...
Loaiza-Bonilla et al. (Tue,) studied this question.
www.synapsesocial.com/papers/69d895206c1944d70ce06220 — DOI: https://doi.org/10.1016/j.esmorw.2026.100706