Abstract Vision Transformers have revolutionized computer vision with unprecedented performance, yet their attention mechanisms remain largely opaque, hindering both trust and optimization in critical applications 1, 2. Despite significant advances in transformer interpretability, existing approaches provide limited structural insights into how attention patterns encode visual information, creating a fundamental barrier to understanding these powerful models 3, 4. We address this challenge by introducing a novel, comprehensive graph-theoretic framework for analyzing Vision Transformer attention mechanisms. Our approach treats attention matrices as directed graphs and applies Tarjan’s Strongly Connected Components algorithm to reveal hidden structural patterns within attention flows 5. Through systematic evaluation of 22 pre-trained models across ViT, DeiT, and CLIP architectures, we discover that Strongly Connected Components with meaningful connectivity directly correspond to semantically coherent visual features—providing a mathematical foundation for understanding attention-based feature learning 6. This insight enables a novel optimization strategy: layers containing minimal meaningful connectivity can be selectively ablated through attention linearization, achieving 16-30% inference speedup with only 0-7% accuracy loss for large models. Our findings demonstrate that ViT models exhibit superior structural organization (92.6% effectiveness) compared to DeiT (51.1%) and CLIP (57.2%) variants, while patch32 configurations achieve optimal performance tradeoffs 7, 8. Beyond interpretability, this work establishes graph theory as a powerful lens for transformer analysis and provides practical tools for deploying efficient vision models without sacrificing semantic understanding 9.
Building similarity graph...
Analyzing shared references across papers
Loading...
Devansh Garg
Indian Institute of Technology Mandi
Building similarity graph...
Analyzing shared references across papers
Loading...
Devansh Garg (Mon,) studied this question.
www.synapsesocial.com/papers/68f5fcdc8d54a28a75cf2625 — DOI: https://doi.org/10.21203/rs.3.rs-7877883/v1
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: