Background: Emergency triage systems using machine learning traditionally rely on structured tabular data (vital signs), creating a “contextual blind spot” that ignores diagnostic information embedded in unstructured clinical narratives. Hybrid AI models that fuse tabular and text data may improve predictive discrimination, but the magnitude and conditions under which fusion adds value remain unclear. Methods: Five databases (PubMed, Scopus, Web of Science, IEEE Xplore, ACM Digital Library) were searched from 1 January 2015 to 15 December 2025. Eligible studies employed Hybrid AI models integrating structured and unstructured emergency department data with quantitative baseline comparisons. Twenty-five studies (N ≈ 4.8 million encounters) met inclusion criteria. We extracted marginal performance gains (ΔAUC), calibration metrics, and demographic reporting. Synthesis followed SWiM principles with subgroup meta-regression testing our novel “Complexity Gradient” hypothesis. Results: Hybrid models demonstrated superior discrimination compared to tabular baselines, with effect magnitude dependent on clinical task complexity. Low-complexity tasks (tachycardia prediction) showed minimal gains (median ΔAUC + 0.036, IQR: 0.02–0.05), while high-complexity tasks (hypoxia, sepsis) demonstrated substantial improvement (median ΔAUC + 0.111, IQR: 0.09–0.13). Meta-regression confirmed complexity significantly moderated effect size (R2 = 0.42, p = 0.003). Only 12% (3/25) of studies reported calibration metrics (Brier scores: 0.089–0.142). Zero studies stratified performance by race/ethnicity; 88% (22/25) failed to report training data demographics. Discussion: The complexity gradient framework explains when multimodal fusion adds predictive value: tasks where diagnostic signal resides in narrative features (temporality, negation) rather than physiological measurements. However, systematic absence of calibration reporting and fairness auditing prevents clinical deployment. Seventy-two percent of studies had high risk of bias in the analysis domain due to retrospective designs without temporal validation. Conclusions: Hybrid triage models show promise for complex diagnostic tasks but require mandatory calibration reporting and demographic performance stratification before clinical implementation. We propose minimum reporting standards including Brier scores, race-stratified metrics, and temporal validation protocols.
Building similarity graph...
Analyzing shared references across papers
Loading...
Junaid Ullah
R. Kanesaraj Ramasamay
Venushini Rajendran
BioMedInformatics
Multimedia University
Building similarity graph...
Analyzing shared references across papers
Loading...
Ullah et al. (Fri,) studied this question.
www.synapsesocial.com/papers/69db38534fe01fead37c6846 — DOI: https://doi.org/10.3390/biomedinformatics6020021