What question did this study set out to answer?

The research aims to improve excavator posture recognition accuracy while optimizing computational resources on edge devices.

April 10, 2026Open Access

A Lightweight Improved RT-DETR for Stereo-Vision-Based Excavator Posture Recognition

Key Points

The research aims to improve excavator posture recognition accuracy while optimizing computational resources on edge devices.
Designed a new backbone network based on Reparameterized Vision Transformer.
Introduced lightweight Dynamic Upsamplers for better information retention.
Implemented a Cross-Attention Fusion Module for enhanced local feature extraction.
Developed a Multi-Scale Fusion Network to boost feature representation.
Achieved a mean average precision (mAP) of 94.29% for small object detection, 7.96% higher than baseline RT-DETR.
Reduced model parameters by 34.95%.
Improved mAP by 8.62% to 12.75% compared to YOLO-series models.
Demonstrated superior detection accuracy and computational efficiency.

Abstract

In intelligent excavator applications, traditional excavator posture recognition methods face two major challenges: limited recognition accuracy and insufficient computing resources on edge devices. To address these issues, this study proposes an excavator posture recognition method based on an improved Real-Time Detection Transformer (RT-DETR). First, a new backbone network is designed based on the Reparameterized Vision Transformer to improve feature utilization efficiency while reducing computational demands. Next, the overall architecture is optimized by introducing lightweight Dynamic Upsamplers, which reduce information loss during upsampling and enhance multi-scale feature fusion. In addition, a Cross-Attention Fusion Module is adopted to strengthen local feature extraction while retaining the global modeling capability of the Transformer, thereby improving the discrimination between foreground and background. Finally, a Multi-Scale Fusion Network is introduced to further enhance the multi-scale feature representation ability of RT-DETR. Experimental results show that the proposed method achieves a mean average precision (mAP) of 94.29% for small object detection, which is 7.96% higher than that of the baseline RT-DETR, while reducing the number of model parameters by 34.95%. Compared with YOLO-series models, the proposed method improves mAP by 8.62% to 12.75%. These results indicate that the proposed method outperforms existing methods in both detection accuracy and computational efficiency and provides an efficient and feasible solution for real-time excavator posture recognition.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Yunlong Hou

Ke Wu

Yuhan Zhang

Journals

Mathematics

Actions

Institutions

North China University of Science and Technology

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

A Lightweight Improved RT-DETR for Stereo-Vision-Based Excavator Posture Recognition

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider