What is the clinical evidence from this study?

Study design: Observational. Population: Cancer (breast, lung, pancreatic) (n=827). Intervention: TRIAGE AI system vs. Expert clinical research coordinator (CRC) adjudication. Primary outcome: Sensitivity, specificity, PPV, NPV, and Cohen's κ for trial-level eligibility using a predefined trial match threshold of 0.40 (Specificity 98.5%, PPV 92.6%, NPV 94.8%, Cohen's κ 0.68).

What does this research mean for the field?

An artificial intelligence-enabled system (TRIAGE) accurately determines clinical trial eligibility for cancer patients from real-world electronic health records with high sensitivity and specificity. Novelty: ClaimNovelty.METHODOLOGICAL. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

This study aims to validate an AI system for matching cancer patients to clinical trials based on real-world data.

May 29, 2026

Expert validation of an artificial intelligence–enabled trial matching solution using real-world data from patients with cancer.

Key Result

The TRIAGE AI system accurately determined trial-level eligibility from real-world EHR data, achieving 78.3% sensitivity, 98.5% specificity, and 92.6% PPV at the predefined 0.40 match threshold.

Key Points

This study aims to validate an AI system for matching cancer patients to clinical trials based on real-world data.
Retrospective study using data from a large community-academic hybrid cancer center.
Comparison of AI system TRIAGE performance against manual CRC adjudications on 629 patients and 4,094 patient-trial pairs.
Evaluation of trial-level eligibility using sensitivity, specificity, and predictive values with defined match thresholds.
In the training set, TRIAGE showed 72% sensitivity and 90% specificity for trial enrollment.
In the test set at the 0.40 threshold, sensitivity was 78.3% and specificity 98.5%.
93.1% agreement in criterion-level evaluations improved to 94.2% after re-adjudication, with 40% CRC disagreements favoring AI.

Study Design

Type

Observational (n=827)

Multicenter

Structured PICO

Does an AI-enabled trial matching solution (TRIAGE) accurately predict clinical trial eligibility compared to expert clinical research coordinator adjudication in patients with cancer?

Population

827 patients with cancer (629 in train set, 198 in test set with breast, lung, and pancreatic cancers) from a large community-academic hybrid cancer center

Intervention

Artificial intelligence (AI) system - Trial Recommendations using Intelligent Assessment to Guide Eligibility and Enrollment (TRIAGE) - using a large language model (LLM) and machine learning

Comparator

Real-world enrollment using longitudinal electronic health records (EHRs), full versioned trial protocols, and expert clinical research coordinator (CRC) adjudication

Outcome

Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and Cohen’s κ for trial-level eligibility using a predefined trial match threshold of 0.40, and accuracy for criterion-level decisionssurrogate

An AI-enabled trial matching system accurately determined clinical trial eligibility from real-world EHR data, demonstrating high specificity and strong agreement with expert coordinators.

Main Result

Effect estimate: Specificity 98.5%, PPV 92.6%, NPV 94.8%, Cohen's κ 0.68

Abstract

1501 Background: Under-enrollment in cancer trials remains a major barrier for advancing cancer research in part because few patients are offered clinical trials and pre-screening is manual and laborious. One solution involves artificial intelligence (AI)-enabled centralized screening of patients, but this requires robust validation to facilitate trust and adoption. Methods: This is a retrospective study from a large community-academic hybrid cancer center comparing the performance of an AI system - Trial Recommendations using Intelligent Assessment to Guide Eligibility and Enrollment (TRIAGE) - to real-world enrollment using longitudinal electronic health records (EHRs), full versioned trial protocols, and expert clinical research coordinator (CRC) adjudication at the trial- and criterion-levels. The train set was composed of 629 patients and 4,094 patient-trial pairs. A large language model (LLM)-only solution was used to predict successful patient enrollment based only on answers to individual eligibility criteria. Next, a machine learning-based approach was used to train the model on top of LLM responses to yield robust, consistent decision rules across patients, with priorities set by the real-world behavior of CRCs. The test set was comprised of 198 patients with breast, lung, and pancreatic cancers and 793 patient-trial pairs. A stratified random sample of 100 patient-trial pairs (83 patients, 21 trials) underwent manual CRC adjudication. The primary outcomes were sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and Cohen’s κ for trial-level eligibility using a predefined trial match threshold of 0.40, and accuracy for criterion-level decisions. Performance evaluations were based on a binary classification: eligible or potentially eligible vs. ineligible. Results: In the training set, TRIAGE demonstrated 72% sensitivity and 90% specificity for trial enrollment. The table summarizes performance metrics in the test set for trial-level eligibility across 3 trial match thresholds. Criterion-level evaluation across 1,770 adjudications showed 93.1% raw agreement, improving to 94.2% after structured re-adjudication; 10/25 (40%) initial CRC discordances were overturned in favor of the AI decision. Conclusions: TRIAGE accurately determined trial-level eligibility from real-world EHR data with high performance and strong criterion-level agreement for oncology protocols. The system surfaced potential missed enrollment opportunities and supports adjustable trial-level decision thresholds. Prospective studies of TRIAGE implementation into research workflows are ongoing. Trial Match Threshold Sensitivity Specificity PPV NPV Cohen’s κ 0.13 (max Sensitivity) 98.7% 97.6% 91.2% 99.7% 0.84 0.40 (study threshold) 78.3% 98.5% 92.6% 94.8% 0.68 0.62 (max Specificity) 39.3% 100% 100% 87.0% 0.45

Bookmark