What question did this study set out to answer?

The study aims to improve anti-UAV detection accuracy by leveraging both visible and infrared imagery.

March 28, 2026Open Access

CSFADet: Dual-Modal Anti-UAV Detection via Cross-Spectral Feature Alignment and Adaptive Multi-Scale Refinement

Key Points

The study aims to improve anti-UAV detection accuracy by leveraging both visible and infrared imagery.
Developed a dual-modal detection framework called CSFADet.
Utilized cross-spectral feature alignment for early-stage spectral calibration.
Implemented multiple modules for feature enhancement and refinement.
Conducted experiments on the Anti-UAV300 benchmark.
Achieved 91.4% mean Average Precision (mAP) at 0.5 IoU.
Recorded 58.7% mAP at 0.5:0.95 IoU.
Outperformed fifteen existing detectors across various categories.

Abstract

Anti-unmanned aerial vehicle (Anti-UAV) detection is critical for airspace security, yet existing single-modality approaches suffer from severe performance degradation under adverse illumination, thermal crossover, and extreme scale variation. In this paper, we propose CSFADet, a dual-modal detection framework that jointly exploits visible and infrared imagery through four tightly integrated modules. First, a Cross-Spectral Feature Alignment (CSFA) module performs early-stage spectral calibration by computing cross-modal query–value attention maps, generating modality-aware channel descriptors that re-weight and concatenate the two spectral streams. Second, a Dual-path Texture Enhancement Module (DTEM) enriches fine-grained spatial details via cascaded convolutions with residual connections. Third, a Dual-path Cross-Attention Module (DCAM) introduces a feature-shrinking token generation strategy followed by symmetric cross-attention branches with learnable scaling factors, Squeeze-and-Excitation recalibration, and a 1×1 convolution fusion head, enabling deep bidirectional interaction between modalities. Fourth, a Dual-path Information Refinement Module (DIRM) embeds Adaptive Residual Groups (ARGs) that cascade Multi-modal Spatial Attention Blocks (MSABs) with channel and dynamic spatial attention, culminating in a Multi-scale Scale-aware Fusion Refinement (MSFR) unit that employs three parallel multi-head attention branches with a Scale Reasoning Gate and Channel Fusion Layer to produce scale-discriminative enhanced features. Experiments on the public Anti-UAV300 benchmark show that CSFADet achieves 91.4% mAP@0.5 and 58.7% mAP@0.5:0.95, surpassing fifteen representative detectors spanning single-stage, two-stage, YOLO-family, and Transformer-based categories. Ablation studies confirm the complementary contributions of each module, and heatmap visualizations verify the model’s capacity to focus on small, distant UAV targets under challenging conditions.

CSFADet: Dual-Modal Anti-UAV Detection via Cross-Spectral Feature Alignment and Adaptive Multi-Scale Refinement

Key Points

Abstract

Cite This Study