Small-object detection in remote sensing imagery faces two specific challenges that existing lightweight detectors fail to address jointly: the irreversible loss of high-frequency boundary cues during repeated downsampling, and feature smearing between neighboring instances caused by uniform multi-scale fusion. This paper presents SFD-Net, a spatial–frequency adaptive network designed to explicitly address these two limitations for aerial imagery. A backbone network and a spatial–frequency adaptive neck are used in the proposed model. Wavelet-based downsampling is applied in the backbone to reduce aliasing while preserving high-frequency information. The direction-sensitive aggregation is incorporated to better capture oriented structural patterns. In the neck, asymmetric and scale-dependent feature routing is introduced to enhance shallow boundary cues, improve instance separation in crowded regions, and limit interference from deep semantic features. Experiments on the VisDrone-DET2019, UAVDT, SIMD, and NWPU VHR-10 datasets demonstrate that SFD-Net achieves a favorable balance between detection accuracy and computational cost. In particular, on the SIMD dataset, SFD-Net achieves 82.2% mAP@0.5 and 66.7% mAP@0.5:0.95 with only 3.4 M parameters and 8.3 GFLOPs. These results indicate that the proposed method is an effective and parameter-efficient solution for remote sensing small-object detection, especially in resource-constrained deployment scenarios.
Zhao et al. (Thu,) studied this question.