March 3, 2026Open Access

Implementation of an ıntelligent vision approach for automated structural defect detection

Key Points

The integrated framework achieves 96.5% average precision in automated defect detection, enhancing structural health monitoring.
A dataset of 300,000 annotated images was meticulously used to validate the framework's performance metrics.
This approach combines YOLOv10, DeepLabV3 + for segmentation, and Graph Attention Networks, enhancing explainability in defect detection.
Practical deployment through the BuildCaption application aims to facilitate faster defect reporting for field inspectors.

Abstract

Automated defect detection forms the backbone of structural health monitoring (SHM) in safeguarding public safety and the integrity of infrastructure. Visual encounters have limitations, such as being subjective, requiring a lot of labor, and being limited in interpretive value. Although deep learning architectures such as YOLO and CNNs have made leaps in advancing defect localization, they generate geometric outputs mainly devoid of contextual explanations—thus creating an explainability gap with respect to actionable engineering insights. To address this explainability gap, we propose a new integrated framework for real-time detection of defects by synergizing YOLOv10, DeepLabV3 + for pixel-wise segmentation, and a fine-tuned CLIP model refined with Graph Attention Networks (GAT) for the generation of domain-specific natural language descriptions. GAT is an enhanced mode that, unlike generic vision-language models, offers engineering-specific captions. Extensive experiments on a carefully curated dataset of 300,000 annotated structural defect images show that the newly proposed integrated framework performs state-of-the-art: 96.5% average precision in detection, 95.1% intersection-over-union in segmentation, and 0.86 BLEU-4 score in captioning, each with a latency of 0.3 s per image supported on off-the-shelf GPU hardware. Ablation studies also establish the merit of the GAT-enhanced local features and multi-scale semantic guidance modules. Practical deployment is planned in BuildCaption, a responsive web application that allows field inspectors to upload images and receive detailed defect reports that include detection, segmentation, and contextual descriptions. Thus, we bring forth an automated workflow, revolutionizing SHM by linking fast visual investigation with explainable actionable insights.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Hafsa Matich

Hajar Mousannif

Journals

Discover Artificial Intelligence

Actions

Institutions

Cadi Ayyad University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Implementation of an ıntelligent vision approach for automated structural defect detection

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study