With the advancement of intelligent manufacturing and Industry 4.0, surface defect detection plays a critical role in ensuring product quality and production safety. To address the limitations of existing detection models in handling small sample sizes, complex textures, and multi-scale defects, this paper proposes a high-performance industrial defect detection model based on the RT-DETR framework, incorporating semantic guidance and hierarchical attention mechanisms. Specifically, a Semantic-Guided Query Enhancement Module is designed to strengthen the contextual awareness of query vectors through multi-source semantic paths and a residual feedback structure. Additionally, a Hierarchical Attention Fusion Structure is constructed to build interactive graphs among multi-scale features, achieving cross-scale semantic alignment and structural consistency modeling. Experiments conducted on three benchmark industrial defect datasets—NEU-DET, DAGM2007, and PCB-DET—demonstrate the effectiveness of the proposed method, achieving mAP@0.5 scores of 78.9%, 84.7%, and 87.4%, respectively, outperforming the best baseline models by 1.2% to 3.0%. For the more stringent mAP@0.5:0.95 metric, the method achieves 44.3%, 48.1%, and 52.3%, significantly surpassing mainstream models such as YOLOv8, and BMA-YOLO. Furthermore, Grad-CAM visualizations validate the model’s superior focus capability and boundary-fitting accuracy in regions with complex textures and sparse targets. Overall, the proposed architecture enhances semantic perception, scale robustness, and generalization performance in industrial defect detection while maintaining real-time efficiency.
Huang et al. (Tue,) studied this question.