May 22, 2024Open Access

Vision Transformer with Sparse Scan Prior

Key Points

Key points are not available for this paper at this time.

Abstract

In recent years, Transformers have achieved remarkable progress in computer vision tasks. However, their global modeling often comes with substantial computational overhead, in stark contrast to the human eye's efficient information processing. Inspired by the human eye's sparse scanning mechanism, we propose a Sparse Scan Self-Attention mechanism (S³A). This mechanism predefines a series of Anchors of Interest for each token and employs local attention to efficiently model the spatial information around these anchors, avoiding redundant global modeling and excessive focus on local information. This approach mirrors the human eye's functionality and significantly reduces the computational load of vision models. Building on S³A, we introduce the Sparse Scan Vision Transformer (SSViT). Extensive experiments demonstrate the outstanding performance of SSViT across a variety of tasks. Specifically, on ImageNet classification, without additional supervision or training data, SSViT achieves top-1 accuracies of 84. 4\%/85. 7\% with 4. 4G/18. 2G FLOPs. SSViT also excels in downstream tasks such as object detection, instance segmentation, and semantic segmentation. Its robustness is further validated across diverse datasets. Code will be available at https: //github. com/qhfan/SSViT.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Fan et al. (Wed,) studied this question.

www.synapsesocial.com/papers/68e68e7db6db643587615cb3 — DOI: https://doi.org/10.48550/arxiv.2405.13335

Authors

Qihang Fan

Huaibo Huang

Mingrui Chen

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Vision Transformer with Sparse Scan Prior

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion