In oriented object detection from drone imagery, many existing RGB-infrared (RGB-IR) fusion methods derive modality weights from input statistics alone, without regard for downstream detection objectives. We present SGFNet, a Semantic-Guided Fusion Network that feeds detection-level semantics back into the fusion stage through learned importance masks. SGFNet comprises three modules: (1) a Frequency-aware Disentanglement Module (FDM) that separates high-frequency textures from low-frequency thermal structures through Laplacian and Gaussian filtering; (2) a Semantic-Guided Module (SGM) that generates P5-level semantic masks to steer fusion toward detection-critical regions; and (3) an Adaptive Geometric Convolution (AGC) whose rotation-aware sampling matches receptive fields to arbitrarily oriented objects. On the DroneVehicle benchmark (28,439 RGB-IR pairs, five vehicle categories), SGFNet achieves 82.0% mAP@0.5, surpassing the runner-up DMM by 3.2 percentage points while lowering mean angular error from 7.4° to 6.2° (−16%). Ablation analysis attributes the largest single-module gain (+1.7 pp) to the semantic feedback path.
Building similarity graph...
Analyzing shared references across papers
Loading...
Liang Zhang
Yueqiu Jiang
Wei Yang
Building similarity graph...
Analyzing shared references across papers
Loading...
Zhang et al. (Sat,) studied this question.
www.synapsesocial.com/papers/69a52dbff1e85e5c73bf0de8 — DOI: https://doi.org/10.3390/electronics15051003