Although recent studies have achieved remarkable progress in remote sensing image understanding by fusing spatial- and frequency-domain features to leverage their complementary strengths, they still face two key limitations: frequency modeling remains rigid due to static constraints, limiting adaptability, and spatial–frequency fusion often suffers from poor generalization and instability across tasks and network depths. Our experiments reveal that the relative importance of low- and high-frequency components varies dynamically across feature hierarchies and training stages, indicating that frequency information is inherently task-dependent and stage-aware. Motivated by these observations, we propose the Frequency–Spatial Self-Calibrated Network (FSSC-Net), a task-driven framework for adaptive frequency modeling and collaborative spatial–frequency fusion. FSSC-Net incorporates a lightweight, plug-and-play self-calibrated frequency modeling mechanism, comprising a Dynamic Frequency Selection Module and a Task-Guided Calibration Fusion Module. This mechanism adaptively modulates frequency responses via soft masks, enabling dynamic extraction of task-relevant low- and high-frequency components and effective alignment between spatial- and frequency-domain features. Moreover, we present a systematic analysis of frequency importance across tasks and training stages, providing quantitative evidence for the necessity of task-calibrated frequency modeling. Extensive experiments on various benchmarks demonstrate that FSSC-Net consistently outperforms state-of-the-art methods, exhibiting strong task adaptability and robust cross-task generalization.
Yuan et al. (Fri,) studied this question.