What question did this study set out to answer?

This research aims to improve Vision Transformer performance on small datasets by reducing parameter counts and enhancing generalization.

May 2, 2026

An Integrated Approach to Vision Transformers: Leveraging Multiple Weight Selection for Small-Sized Dataset

Key Points

This research aims to improve Vision Transformer performance on small datasets by reducing parameter counts and enhancing generalization.
Introduced a novel architectural framework utilizing advanced decomposition strategies like Hydra attention and SimA attention.
Implemented multiple weight selection for optimized initialization from pretrained models.
Applied local patch interaction techniques to enhance local information capture.
Achieved a 52.93% reduction in parameters for the 6-layer model.
Demonstrated high performance across various small-scaled datasets.

Abstract

Vision Transformers are demonstrated remarkable performance in computer vision, rivaling traditional convolutional neural networks on large datasets. However, their potential is limited when applied to small datasets due to excessive parameters, suboptimal generalization, and the high computational cost of the quadratic attention mechanism. Addressing these challenges, our work focuses on two key strategies: reducing redundant parameters and enhance model’s ability in generalization. To reduce parameter counts, we introduce a novel architectural framework that incorporates advanced decomposition strategies, including Hydra attention, linear angular attention, SimA attention, and class attention, effectively lowering the parameter counts and computational complexity. These mechanisms are complemented by techniques like local patch interaction and locality self-attention to improve the model’s capacity to capture local information. For enhanced generalization, we propose a new technique called multiple weight selection. This method leverages a subset of weights from diverse pretrained models, providing an optimized initialization that accelerates convergence and improves performance on small datasets. Our approach achieves a 52.93% reduction in parameters for the 6-layer model while delivering high performance across various small-scaled datasets. These innovations collectively underscore the potential of our methodology in enabling Vision Transformer to excel in resource-constrained scenarios, setting the stage for future advancements in transformer-based architectures.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Tien Dang

Khang Nguyen

Journals

Pattern Recognition and Image Analysis

Actions

Institutions

Vietnam National University Ho Chi Minh City

Ho Chi Minh City University of Science

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

An Integrated Approach to Vision Transformers: Leveraging Multiple Weight Selection for Small-Sized Dataset

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider