Accurate semantic segmentation of high-resolution remote sensing imagery is crucial for applications such as land cover mapping, urban development monitoring, and disaster response. However, remote sensing data still present inherent challenges, including complex spatial structures, significant intra-class variability, and diverse object scales, which demand models capable of capturing rich contextual information from both local and global regions. To address these issues, we propose ArgusNet, a novel segmentation framework that enhances multi-scale representations through a series of carefully designed fusion mechanisms. At the core of ArgusNet lies the synergistic integration of Adaptive Windowed Additive Attention (AWAA) and 2D Selective Scan (SS2D). Specifically, our AWAA extends additive attention into a window-based structure with a dynamic routing mechanism, enabling multi-perspective local feature interaction via multiple global query vectors. Furthermore, we introduce a decoder optimization strategy incorporating three-stage feature fusion and a Macro Guidance Module (MGM) to improve spatial detail preservation and semantic consistency. Experiments on benchmark remote sensing datasets demonstrate that ArgusNet achieves competitive and improved segmentation performance compared to state-of-the-art methods, particularly in scenarios requiring fine-grained object delineation and robust multi-scale contextual understanding.
Ren et al. (Thu,) studied this question.