What question did this study set out to answer?

The study aims to improve the accuracy of ultra-short-term solar radiation forecasting by reducing systematic biases and errors under specific weather conditions.

April 8, 2026Open Access

Physics-constrained multimodal vision transformer for ultra-short-term solar radiation forecasting error correction

Key Points

The study aims to improve the accuracy of ultra-short-term solar radiation forecasting by reducing systematic biases and errors under specific weather conditions.
Developed an integrated error correction framework
Combined multimodal vision transformers with physics-constrained neural networks
Extracted features from visible, infrared, and water vapor satellite data
Validated using four years of satellite observations and ground measurements.
Achieved 18.7% reduction in RMSE compared to baselines
Reduced systematic bias from 12.7 W/m² to 1.2 W/m²
Demonstrated efficient processing of 21 forecasts per second on consumer-grade hardware.

Abstract

Accurate ultra-short-term solar radiation forecasting is critical for renewable energy integration and power grid stability, yet operational systems exhibit systematic biases and accumulating errors during transitional weather conditions. This research develops an integrated error correction framework that synergistically combines multimodal vision transformers with physics-constrained neural networks to refine satellite-based solar forecasts. The proposed architecture extracts complementary spatiotemporal features from visible, infrared, and water vapor satellite channels through hierarchical cross-modal attention mechanisms, while simultaneously enforcing fundamental physical principles including energy conservation and radiative transfer constraints as soft regularization terms during training. Experimental validation using four years of Himawari-8 geostationary satellite observations and ground measurements from 47 meteorological stations demonstrates that the framework achieves 18.7% RMSE reduction compared to best-performing baselines, with systematic bias reduced from 12.7 W/m² to 1.2 W/m². Ablation studies reveal synergistic interactions between multimodal fusion and physics-aware learning, with the combined approach delivering benefits exceeding individual components. The model maintains computational efficiency suitable for operational deployment, processing 21 forecasts per second on consumer-grade hardware while respecting radiative energy consistency across all predictions.

Bookmark

View Full Paper

Bookmark

View Full Paper

Physics-constrained multimodal vision transformer for ultra-short-term solar radiation forecasting error correction

Key Points

Abstract

Cite This Study