Ship classification in synthetic aperture radar (SAR) imagery is essential for maritime surveillance but remains challenging due to limited resolution, insufficient textural details, and difficulties in effectively fusing multimodal information. Existing methods either rely on handcrafted features with limited adaptability or employ simplistic fusion strategies that fail to fully exploit the complementary guidance across modalities. To address these issues, we propose a multimodal fusion network based on a learnable feature preprocessing front-end (LFPF-MFN), which integrates polarimetric, textural, and geometric information in an end-to-end learnable manner. Specifically, LFPF-MFN introduces a learnable preprocessing front-end to embed scattering and enhanced textural features. Meanwhile, geometric information from the Automatic Identification System (AIS) is incorporated through textual embedding, and effective multimodal fusion is achieved via a bidirectional cross-attention mechanism. Extensive experiments on the OpenSARShip 2.0 dataset demonstrate that the proposed method achieves state-of-the-art performance in both three-class and six-class classification tasks, validating the effectiveness of each designed module and the superiority of the multimodal fusion strategy.
Wang et al. (Sun,) studied this question.