What question did this study set out to answer?

This research aims to improve object detection in unmanned aerial vehicles by applying modern computational techniques.

February 3, 2026Open Access

Real-time object detection for unmanned aerial vehicles based on vision transformer and edge computing

Key Points

This research aims to improve object detection in unmanned aerial vehicles by applying modern computational techniques.
Proposed an integrated framework with a hierarchical attention mechanism and dynamic token pruning.
Compared performance of a lightweight vision transformer variant against standard convolutional networks.
Conducted experiments with a UAV platform, using mixed-precision quantization and structured pruning.
Achieved 68% reduction in floating-point operations while maintaining 94.3% accuracy.
Demonstrated 73.9% mean Average Precision at 39.2 FPS, surpassing YOLOv5s accuracy by 4.7%.
Small object detection improved by 7.4% Average Precision compared to traditional CNNs.

Abstract

Existing lightweight Convolutional Neural Network (CNN) detectors deployed on Unmanned Aerial Vehicle (UAV) platforms struggle with small object recognition and fail to capture long-range spatial dependencies, while standard Vision Transformer (ViT) architectures suffer from quadratic computational complexity that prohibits real-time inference on embedded hardware. This paper bridges this gap by proposing an integrated framework that adapts ViT for UAV-based real-time object detection through edge computing infrastructure. Our work presents three key contributions: (1) a hierarchical attention mechanism with shifted windows that reduces complexity from O(n²) to O(n), (2) a dynamic token pruning strategy that adaptively discards uninformative background tokens based on attention variance, and (3) a dual-mode edge-UAV collaborative architecture enabling seamless switching between autonomous onboard processing and server-assisted computation. The lightweight ViT variant achieves 68% reduction in floating-point operations (FLOPs) while preserving 94.3% relative accuracy. Through systematic optimization combining mixed-precision quantization, structured pruning, and operator fusion, we obtain 11.2× inference speedup over baseline implementations. Experiments on our collected aerial dataset demonstrate 73.9% mAP@0.5:0.95 at 39.2 frames per second (FPS) on NVIDIA Jetson Xavier NX, surpassing YOLOv5s by 4.7% in accuracy under identical real-time constraints. Notably, small object detection improves by 7.4% Average Precision (AP) compared to CNN baselines. Week-long field trials on DJI Matrice 300 RTK validate sustained performance across varying illumination, platform vibration, and intermittent network connectivity, confirming practical viability for time-critical applications including search and rescue, disaster response, and infrastructure inspection.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Social Feed

Authors

Wenyao Zhu

Ken Chen

Journals

Scientific Reports

Actions

Institutions

Lishui University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Zhu et al. (Sat,) studied this question.

www.synapsesocial.com/papers/6981456cf607237d8b54d42f — DOI: https://doi.org/10.1038/s41598-026-37938-5

Also consider

Synapse has enriched 4 closely related papers on similar clinical questions. Consider them for comparative context:

UAV-Assisted Mobile Edge Computing: Dynamic Trajectory Design and Resource Allocation· 2024 · 14 citations
Special Issue on Intelligent Image Processing and Sensing for Drones· 2024 · 2 citations
YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications· 2022 · 1,740 citations
Few-shot 3D Point Cloud Semantic Segmentation· 2021 · 114 citations

Real-time object detection for unmanned aerial vehicles based on vision transformer and edge computing

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Social Feed

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider