Abstract Machine learning (ML) poses a potential paradigm shift in weather forecasting, but critical questions arise regarding its ability to predict high‐impact weather events. This study evaluates five state‐of‐the‐art ML models—Aurora, GraphCast, PanguWeather, FourCastNetV2, FourCastNet—in forecasting U.S. West Coast atmospheric rivers (ARs), compared to the high‐performing physics‐based European Center for Medium‐Range Weather Forecasts' high‐resolution system (HRES) model. Analysis of 152 daily forecast cycles (November 2023–March 2024) reveals significant performance differences between the systems. While ML models often show better variable‐specific root mean square error (RMSE), HRES has superior AR detection skill for the first four forecast days. PanguWeather matches HRES skill beyond day four; other ML models lag slightly. Aurora consistently exhibits the lowest AR detection performance, despite strong variable‐specific RMSE metrics, highlighting a disconnect between RMSE performance and its ability to predict AR events. These findings underscore the need for phenomenon‐specific metrics for ML‐based numerical weather prediction model assessment and operational implementation.
Building similarity graph...
Analyzing shared references across papers
Loading...
Isaac Davis
Aneesh C. Subramanian
Timothy B. Higgins
Geophysical Research Letters
University of California, San Diego
University of Colorado Boulder
Scripps Institution of Oceanography
Building similarity graph...
Analyzing shared references across papers
Loading...
Davis et al. (Sat,) studied this question.
www.synapsesocial.com/papers/6994055d4e9c9e835dfd638b — DOI: https://doi.org/10.1029/2025gl117609