Remote sensing object detection is fundamental to Earth observation, yet remains challenging when relying on a single sensing modality. While optical imagery provides rich spatial and textural details, it is highly sensitive to illumination and adverse weather; conversely, Synthetic Aperture Radar (SAR) offers robust all-weather acquisition but suffers from speckle noise and limited semantic interpretability. To address these limitations, we leverage the potential of foundation models for optical–SAR object detection via a novel gated–guided fusion approach. By integrating transferable and generalizable representations from foundation models into the detection pipeline, we enhance semantic expressiveness and cross-environment robustness. Specifically, a gated–guided fusion mechanism is designed to selectively merge cross-modal features with foundational priors, enabling the network to prioritize informative cues while suppressing unreliable signals in complex scenes. Furthermore, we propose a dual-stream architecture incorporating attention mechanisms and State Space Models (SSMs) to simultaneously capture local and long-range dependencies. Extensive experiments on the large-scale M4-SAR dataset demonstrate that our method achieves state-of-the-art performance, significantly improving detection accuracy and robustness under challenging sensing conditions.
Building similarity graph...
Analyzing shared references across papers
Loading...
Qianyin Jiang
Jianshang Liao
Qiuyu Lin
ISPRS International Journal of Geo-Information
Nanjing University of Posts and Telecommunications
Guangzhou Maritime College
Building similarity graph...
Analyzing shared references across papers
Loading...
Jiang et al. (Wed,) studied this question.
www.synapsesocial.com/papers/69d895be6c1944d70ce06d56 — DOI: https://doi.org/10.3390/ijgi15040160