UAVs are widely used for all-weather, round-the-clock security inspections in urban and industrial areas. However, pure visible-light systems fail at night or in adverse weather conditions, while pure infrared methods are limited by thermal noise, low spatial resolutions, and high false alarm rates. Multispectral images render the task of object detection highly reliable and robust by providing complementary target feature information. This study suggests a frequency-based cross-attention transformer (FCAT) for multispectral object detection as a solution to this issue. This approach collects cross-modal complementary characteristics, effectively learns and integrates global contextual information via the cross-attention mechanism, and greatly increases multispectral object detection accuracy. At the same time, spatial-domain features are mapped to the frequency domain via the Fourier transform, and the scaled dot product attention is estimated via element-wise product operations, which break through the limitation of traditional spatial-domain matrix multiplication and effectively reduce the computational cost of the model. Additionally, this study independently builds a multi-scene multi-time climate visible–infrared dataset (OPVM-VIRD), which contains 20,025 target instances, to address the issue of the lack of all-weather cross-spectral data in object detection tasks from the perspective of UAVs. Experimental findings from the OPVM-VIRD, M3FD, and FLIR datasets demonstrate that our proposed approach outperforms prevailing state-of-the-art multispectral object detection algorithms on public benchmarks, while the FCAT model achieves an mAP50 score of 94.7% on our custom-built dataset—10.8% higher than ICAF. At the same time, the number of FCAT parameters is 85.26 M, which is significantly lower than that of mainstream models, such as ICAF. Therefore, the FCAT is a change detection strategy with strong model generalization abilities, and it has important application value in the all-day and all-weather security patrol of cities and industrial parks carried out by UAVs.
Building similarity graph...
Analyzing shared references across papers
Loading...
Kewei Li
Ziyi Zhong
Ziyue Luo
Building similarity graph...
Analyzing shared references across papers
Loading...
Li et al. (Sat,) studied this question.
www.synapsesocial.com/papers/69ada8cfbc08abd80d5bc175 — DOI: https://doi.org/10.3390/rs18050826