Accurate, real-time estimation of core body temperature (CBT) during physical activity is essential for monitoring heat strain and mitigating the risk of heat-related illness under hot environmental conditions. Although numerous data-driven algorithms using wearable sensors have been proposed, their practical reliability remains unclear due to substantial methodological heterogeneity and the absence of standardized evaluation. This study combined a systematic review with a standardized quantitative benchmark. A total of 38 studies employing non-invasive inputs for CBT estimation were identified. Of these, 14 eligible models, including Kalman filter–based methods, statistical models, and machine-learning approaches, were re-implemented and evaluated under identical preprocessing and evaluation settings using two independent datasets: Dataset 1 (treadmill walking, ) and Dataset 2 (cycling, ). The benchmark revealed notable differences between originally reported performance and reproduced performance under standardized conditions. For the widely used heart-rate–based extended Kalman filter, the root mean square error (RMSE) increased from typically reported values of 0.21–0.41 C to 0.41 C on Dataset 1 and 0.66 C on Dataset 2. Incorporating skin temperature improved tracking accuracy in some configurations, but performance gains were highly dependent on measurement site and dataset. Sensitivity for detecting elevated CBT ( 38.0 C) varied markedly across methods, particularly for the cycling protocol. In conclusion, no single CBT estimation approach consistently outperformed others across all settings. Heart-rate–only models provided a stable baseline under limited sensing conditions, whereas multimodal approaches offered conditional benefits in more controlled scenarios. This work establishes a standardized benchmark framework to support fair comparison, method selection, and future development of (wearable) CBT estimation technologies. • Systematic review of core temperature estimation during physical activity in hot environments. • Standardized benchmark across two controlled heat-exposure datasets reveals strong dataset dependence. • Results inform deployment of wearable heat-strain monitoring in occupational and built-environment settings.
Building similarity graph...
Analyzing shared references across papers
Loading...
Yuanzhe Zhao
Weihao Li
Jeroen HM Bergmann
Building and Environment
University of Oxford
University of Southern Denmark
Beihang University
Building similarity graph...
Analyzing shared references across papers
Loading...
Zhao et al. (Mon,) studied this question.
www.synapsesocial.com/papers/69d892886c1944d70ce03f6b — DOI: https://doi.org/10.1016/j.buildenv.2026.114591