Vision Transformers are demonstrated remarkable performance in computer vision, rivaling traditional convolutional neural networks on large datasets. However, their potential is limited when applied to small datasets due to excessive parameters, suboptimal generalization, and the high computational cost of the quadratic attention mechanism. Addressing these challenges, our work focuses on two key strategies: reducing redundant parameters and enhance model’s ability in generalization. To reduce parameter counts, we introduce a novel architectural framework that incorporates advanced decomposition strategies, including Hydra attention, linear angular attention, SimA attention, and class attention, effectively lowering the parameter counts and computational complexity. These mechanisms are complemented by techniques like local patch interaction and locality self-attention to improve the model’s capacity to capture local information. For enhanced generalization, we propose a new technique called multiple weight selection. This method leverages a subset of weights from diverse pretrained models, providing an optimized initialization that accelerates convergence and improves performance on small datasets. Our approach achieves a 52.93% reduction in parameters for the 6-layer model while delivering high performance across various small-scaled datasets. These innovations collectively underscore the potential of our methodology in enabling Vision Transformer to excel in resource-constrained scenarios, setting the stage for future advancements in transformer-based architectures.
Building similarity graph...
Analyzing shared references across papers
Loading...
Tien Dang
Khang Nguyen
Pattern Recognition and Image Analysis
Vietnam National University Ho Chi Minh City
Ho Chi Minh City University of Science
Building similarity graph...
Analyzing shared references across papers
Loading...
Dang et al. (Mon,) studied this question.
www.synapsesocial.com/papers/69f593f271405d493affec17 — DOI: https://doi.org/10.1134/s1054661825700440
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: