What question did this study set out to answer?

Evaluate the effectiveness and limitations of the HERMES-24 score for predicting outcomes in stroke patients treated in a late time window.

April 21, 2026Open Access

Reader Response: Validation of the HERMES-24 Score for Outcome Prediction Post Large Vessel Occlusion Treatment in Later Time Window

Key Points

Evaluate the effectiveness and limitations of the HERMES-24 score for predicting outcomes in stroke patients treated in a late time window.
Reviewed methodological aspects of HERMES-24 score validation.
Discussed the importance of recalibrating the model for new populations.
Analyzed subgroup performance and validity of C statistics.
Identified limitations in calibration methods used for the score.
Stressed the need for recalibration due to worse outcomes in late-treated patients.
Pointed out potential issues of subgroup analyses leading to uncertainty in model performance.

Abstract

We read with great interest the article by Tanaka et al.1 This is a valuable contribution, as few prediction models in stroke have undergone external validation. The model’s simplicity and its discriminative performance in late-window patients are notable strengths. However, several methodological aspects warrant attention.First, the authors assessed calibration by grouping scores and comparing them with observed outcome frequencies in a bar chart. This limits interpretation across the full risk spectrum. Converting scores into predicted probabilities and plotting a smooth calibration curve would better reveal miscalibration, thereby informing model updating.2Second, recalibration was not examined. Systematic over- or underestimation is common when applying models to new populations. Given that late-treated patients have worse outcomes than early-treated ones (e.g. mRS 0-2: 32.2% in AURORA and 46.6% in HERMES), recalibration is essential to avoid biased risk estimates in future patients.3Finally, subgroup analyses of discriminative performance are difficult to interpret. C statistics reflect both model validity and heterogeneity of validation cohort,4 making it unclear whether observed changes indicate true performance differences or heterogeneity in subgroups. Wide confidence intervals further suggest under-powered and uncertain results.This study is commendable, and we hope these comments support further refinement of this promising model.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Xi; id_orcid 0009-0006-7899-8239 Li

Bob Roozenbeek

Hester Lingsma

Actions

Institutions

Neurology, Inc

Faculty of Public Health

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Reader Response: Validation of the HERMES-24 Score for Outcome Prediction Post Large Vessel Occlusion Treatment in Later Time Window

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study