What question did this study set out to answer?

The aim is to enhance the faithfulness of attribution methods in interpreting chest X-rays using Vision Transformers.

March 18, 2026Open Access

Faithful Attention Attribution in Vision Transformers for Chest X-Ray Interpretation

Key Points

The aim is to enhance the faithfulness of attribution methods in interpreting chest X-rays using Vision Transformers.
Developed Feature-Gradient Attribution based on TransMM's principles.
Utilized Sparse Autoencoders to create interpretable features from residual streams.
Computed feature-gradient scores to improve attention maps before relevance propagation.
Evaluated performance across multiple datasets and architectures using various faithfulness metrics.
Achieved statistically significant improvements in attribution faithfulness (p<0.001).
Notable gains of 10.5-34.8% on SaCo and 9.7-43.0% on Faithfulness Correlation.
Pixel Flipping metric improved by 1.8-10.8%.
No degradation observed relative to the previous TransMM method on any metric.

Abstract

Vision Transformers (ViTs) achieve strong performance in natural and medical imaging, yet their decision processes remain opaque. This is especially problematic in high-stakes settings like chest X-ray interpretation. TransMM is among the strongest attribution methods for ViTs, combining attention with class-specific gradients to highlight influential image patches. We ask whether injecting semantic structure from Sparse Autoencoders (SAEs) can further improve the faithfulness of such attributions.We introduce Feature-Gradient Attribution, which extends TransMM’s principle from attention space to feature space. SAEs are trained on residual streams to decompose activations into sparse, interpretable features, providing per-patch feature activations. We project gradients onto the SAE feature basis and compute feature-gradient scores that capture both which learned features are present and how they influence the target logit. These scores yield per-patch gates that modulate TransMM’s attention maps before relevance propagation, forming a lightweight, semantically informed correction.Across three datasets (chest X-rays, endoscopy, natural images), two architectures (finetuned ViT-B/16 and contrastively pre-trained CLIP ViT-B/32), and three complementary faithfulness metrics, our method improves attribution faithfulness consistently. Improvements are statistically significant (p<0.001) on all three metrics for one dataset and on two of three metrics for the remaining datasets. We observe gains of 10.5-34.8% on SaCo and 9.7-43.0% on Faithfulness Correlation, with Pixel Flipping improving by 1.8-10.8%. Notably, we never observe degradation relative to TransMM on any metric–dataset combination.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Julius Šula

Actions

Institutions

TU Wien

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Faithful Attention Attribution in Vision Transformers for Chest X-Ray Interpretation

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study