Undertriage—the misclassification of acutely ill patients into low-acuity triage categories—is a persistent patient safety concern, and prior machine learning approaches restricted to vital signs have yielded modest predictive performance. We hypothesized that this ceiling reflects feature restriction rather than an inherent predictive barrier. In this retrospective cohort study of 10,792 adult patients (age ≥ 18) initially triaged as Korean Triage and Acuity Scale (KTAS) level 4 or 5 across two tertiary academic centers during 2025, the primary outcome was triage reclassification—change from initial KTAS 4/5 to final KTAS 1–3 (n = 941; 8.7%). Five nested feature sets of increasing breadth were compared using logistic regression (LR) and gradient-boosting classifiers (GBC). Calibration (slope, intercept, Brier score), sensitivity/specificity/positive and negative predictive values at operating thresholds of 3%, 5%, and 10%, and decision-curve net benefit were evaluated on a held-out test partition. NEWS alone yielded an AUROC of 0.58, whereas the full triage-time panel (Set E; 43 features) achieved a GBC AUROC of 0.72 (95% CI 0.68–0.76; 5-fold CV 0.73 ± 0.02) and an AUPRC of 0.23, approximately doubling the NEWS baseline (0.12). The model was well calibrated, with a Brier score of 0.075, a calibration slope of 0.85 (95% CI 0.70–1.01), and an intercept of −0.30 (95% CI −0.65 to 0.07); both intervals included the ideal values of 1 and 0, indicating that predicted probabilities can be interpreted as approximate absolute event likelihoods. At a 5% operating threshold, sensitivity was 0.79, capturing 79% of reclassifications while flagging 53% of the cohort. Decision curve analysis demonstrated positive net clinical benefit across thresholds of 3–20%, exceeding both a vital-signs-only model and the treat-all/treat-none baselines. Feature importance analysis identified pain score, onset-to-arrival time, heart rate, systolic blood pressure, and age as the dominant predictors. Contextual variables routinely documented at triage—particularly pain score and onset-to-arrival time—together with heart rate and systolic blood pressure form a discriminative composite that exceeds the performance of vital-signs-only models in the KTAS 4/5 subpopulation. The resulting model is well calibrated and provides positive net clinical benefit across the 3–20% threshold range, supporting its potential role as a secondary screening flag for low-acuity patients warranting clinician re-review. External validation in independent cohorts is needed before clinical deployment.
Cha et al. (Mon,) studied this question.