What question did this study set out to answer?

To improve the performance of vision-language models in zero-shot anomaly detection by enhancing feature representation.

April 17, 2026Open Access

BAG-CLIP: Bifurcated Attention Graph-Enhanced CLIP for Zero-Shot Industrial Anomaly Detection

Key Points

To improve the performance of vision-language models in zero-shot anomaly detection by enhancing feature representation.
Proposes BAG-CLIP, utilizing a Bifurcated Self-Attention module to process visual features separately.
Implements a Self-Attention Graph module to capture the topological structure of anomalies.
Conducts extensive experiments on five industrial datasets, comparing against 11 state-of-the-art methods.
BAG-CLIP outperforms the second-best methods by 3.7% and 2.8% on the EPED and MPDD datasets respectively.
Achieves superior performance in zero-shot anomaly detection and segmentation compared to existing methods.

Abstract

While vision-language models (VLMs) have been widely applied in zero-shot anomaly detection (ZSAD), their performance remains limited by the inability to distinguish fine-grained normal and abnormal textures, coupled with inadequate capabilities in detecting complex morphological anomalies. To address these limitations, this paper proposes BAG-CLIP (Bifurcated Attention Graph-Enhanced CLIP), a dual-path graph-enhanced zero-shot anomaly detection method. This approach employs a Bifurcated Self-Attention (BSA) module to decouple visual features, processing global semantics and spatial details separately to mitigate the inherent conflict between abstract semantic representation and precise spatial localization. A Self-Attention Graph (SAG) module is designed to model the topological structure of complex morphological anomalies. This module dynamically constructs visual features’ topological relationships and utilizes graph convolutions to aggregate neighborhood information, thereby enhancing the model’s representational capacity for diverse and complex morphological anomalies. Extensive experiments are conducted on five diverse industrial datasets, featuring complex transmission line backgrounds alongside general industrial scenarios. The proposed method is comprehensively evaluated against 11 state-of-the-art (SOTA) methods. On the EPED (Electrical Power Equipment Dataset) and MPDD datasets, BAG-CLIP outperforms the second-best methods in image-level AUROC (Area Under the Receiver Operating Characteristic Curve) by 3.7% and 2.8%, respectively. BAG-CLIP achieves superior performance in both zero-shot anomaly detection and segmentation.

BAG-CLIP: Bifurcated Attention Graph-Enhanced CLIP for Zero-Shot Industrial Anomaly Detection

Key Points

Abstract

Cite This Study