March 24, 2024Open Access

LF-ViT: Reducing Spatial Redundancy in Vision Transformer for Efficient Image Recognition

Key Points

Key points are not available for this paper at this time.

Abstract

The Vision Transformer (ViT) excels in accuracy when handling high-resolution images, yet it confronts the challenge of significant spatial redundancy, leading to increased computational and memory requirements. To address this, we present the Localization and Focus Vision Transformer (LF-ViT). This model operates by strategically curtailing computational demands without impinging on performance. In the Localization phase, a reduced-resolution image is processed; if a definitive prediction remains elusive, our pioneering Neighborhood Global Class Attention (NGCA) mechanism is triggered, effectively identifying and spotlighting class-discriminative regions based on initial findings. Subsequently, in the Focus phase, this designated region is used from the original image to enhance recognition. Uniquely, LF-ViT employs consistent parameters across both phases, ensuring seamless end-to-end optimization. Our empirical tests affirm LF-ViT's prowess: it remarkably decreases Deit-S's FLOPs by 63% and concurrently amplifies throughput twofold. Code of this project is at https://github.com/edgeai1/LF-ViT.git.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Hu et al. (Sun,) studied this question.

www.synapsesocial.com/papers/68e72968b6db6435876a3871 — DOI: https://doi.org/10.1609/aaai.v38i3.28001

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Revisiting Unreasonable Effectiveness of Data in Deep Learning Era· 2017 · 2,274 citations
SkipNet: Learning Dynamic Routing in Convolutional Networks· 2018 · 620 citations
Revisiting Unreasonable Effectiveness of Data in Deep Learning Era· 2017 · 303 citations
Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning· 2022 · 8 citations
SkipNet: Learning Dynamic Routing in Convolutional Networks

Authors

Youbing Hu

Yun Cheng

Anqi Lu

Actions

Institutions

Harbin Institute of Technology

Xidian University

Swiss Data Science Center

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

LF-ViT: Reducing Spatial Redundancy in Vision Transformer for Efficient Image Recognition

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion