Automated defect detection forms the backbone of structural health monitoring (SHM) in safeguarding public safety and the integrity of infrastructure. Visual encounters have limitations, such as being subjective, requiring a lot of labor, and being limited in interpretive value. Although deep learning architectures such as YOLO and CNNs have made leaps in advancing defect localization, they generate geometric outputs mainly devoid of contextual explanations—thus creating an explainability gap with respect to actionable engineering insights. To address this explainability gap, we propose a new integrated framework for real-time detection of defects by synergizing YOLOv10, DeepLabV3 + for pixel-wise segmentation, and a fine-tuned CLIP model refined with Graph Attention Networks (GAT) for the generation of domain-specific natural language descriptions. GAT is an enhanced mode that, unlike generic vision-language models, offers engineering-specific captions. Extensive experiments on a carefully curated dataset of 300,000 annotated structural defect images show that the newly proposed integrated framework performs state-of-the-art: 96.5% average precision in detection, 95.1% intersection-over-union in segmentation, and 0.86 BLEU-4 score in captioning, each with a latency of 0.3 s per image supported on off-the-shelf GPU hardware. Ablation studies also establish the merit of the GAT-enhanced local features and multi-scale semantic guidance modules. Practical deployment is planned in BuildCaption, a responsive web application that allows field inspectors to upload images and receive detailed defect reports that include detection, segmentation, and contextual descriptions. Thus, we bring forth an automated workflow, revolutionizing SHM by linking fast visual investigation with explainable actionable insights.
Building similarity graph...
Analyzing shared references across papers
Loading...
Hafsa Matich
Hajar Mousannif
Discover Artificial Intelligence
Cadi Ayyad University
Building similarity graph...
Analyzing shared references across papers
Loading...
Matich et al. (Sat,) studied this question.
www.synapsesocial.com/papers/69a76112c6e9836116a2ea11 — DOI: https://doi.org/10.1007/s44163-026-00961-6