What question did this study set out to answer?

This research aims to improve the detection of undertriage in emergency department patients by using comprehensive triage-time data.

June 3, 2026Open Access

Beyond Vital Signs: A Machine Learning Model Using Comprehensive Triage-Time Data to Detect Undertriage in Emergency Department Patients

Key Points

This research aims to improve the detection of undertriage in emergency department patients by using comprehensive triage-time data.
Retrospective cohort study of 10,792 adult patients triaged as KTAS level 4 or 5.
Comparison of five nested feature sets using logistic regression and gradient-boosting classifiers.
Evaluation of model performance using AUROC, sensitivity, specificity, and decision curve analysis.
The full triage-time model achieved a GBC AUROC of 0.72 (95% CI 0.68–0.76).
At a 5% operating threshold, the model had a sensitivity of 0.79, identifying 79% of reclassifications while flagging 53% of patients.
Decision curve analysis indicated a positive net clinical benefit across thresholds of 3–20%, outperforming vital-signs-only models.

Abstract

Undertriage—the misclassification of acutely ill patients into low-acuity triage categories—is a persistent patient safety concern, and prior machine learning approaches restricted to vital signs have yielded modest predictive performance. We hypothesized that this ceiling reflects feature restriction rather than an inherent predictive barrier. In this retrospective cohort study of 10,792 adult patients (age ≥ 18) initially triaged as Korean Triage and Acuity Scale (KTAS) level 4 or 5 across two tertiary academic centers during 2025, the primary outcome was triage reclassification—change from initial KTAS 4/5 to final KTAS 1–3 (n = 941; 8.7%). Five nested feature sets of increasing breadth were compared using logistic regression (LR) and gradient-boosting classifiers (GBC). Calibration (slope, intercept, Brier score), sensitivity/specificity/positive and negative predictive values at operating thresholds of 3%, 5%, and 10%, and decision-curve net benefit were evaluated on a held-out test partition. NEWS alone yielded an AUROC of 0.58, whereas the full triage-time panel (Set E; 43 features) achieved a GBC AUROC of 0.72 (95% CI 0.68–0.76; 5-fold CV 0.73 ± 0.02) and an AUPRC of 0.23, approximately doubling the NEWS baseline (0.12). The model was well calibrated, with a Brier score of 0.075, a calibration slope of 0.85 (95% CI 0.70–1.01), and an intercept of −0.30 (95% CI −0.65 to 0.07); both intervals included the ideal values of 1 and 0, indicating that predicted probabilities can be interpreted as approximate absolute event likelihoods. At a 5% operating threshold, sensitivity was 0.79, capturing 79% of reclassifications while flagging 53% of the cohort. Decision curve analysis demonstrated positive net clinical benefit across thresholds of 3–20%, exceeding both a vital-signs-only model and the treat-all/treat-none baselines. Feature importance analysis identified pain score, onset-to-arrival time, heart rate, systolic blood pressure, and age as the dominant predictors. Contextual variables routinely documented at triage—particularly pain score and onset-to-arrival time—together with heart rate and systolic blood pressure form a discriminative composite that exceeds the performance of vital-signs-only models in the KTAS 4/5 subpopulation. The resulting model is well calibrated and provides positive net clinical benefit across the 3–20% threshold range, supporting its potential role as a secondary screening flag for low-acuity patients warranting clinician re-review. External validation in independent cohorts is needed before clinical deployment.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper