Accurate ultra-short-term solar radiation forecasting is critical for renewable energy integration and power grid stability, yet operational systems exhibit systematic biases and accumulating errors during transitional weather conditions. This research develops an integrated error correction framework that synergistically combines multimodal vision transformers with physics-constrained neural networks to refine satellite-based solar forecasts. The proposed architecture extracts complementary spatiotemporal features from visible, infrared, and water vapor satellite channels through hierarchical cross-modal attention mechanisms, while simultaneously enforcing fundamental physical principles including energy conservation and radiative transfer constraints as soft regularization terms during training. Experimental validation using four years of Himawari-8 geostationary satellite observations and ground measurements from 47 meteorological stations demonstrates that the framework achieves 18.7% RMSE reduction compared to best-performing baselines, with systematic bias reduced from 12.7 W/m² to 1.2 W/m². Ablation studies reveal synergistic interactions between multimodal fusion and physics-aware learning, with the combined approach delivering benefits exceeding individual components. The model maintains computational efficiency suitable for operational deployment, processing 21 forecasts per second on consumer-grade hardware while respecting radiative energy consistency across all predictions.
Jiang et al. (Sun,) studied this question.