What type of study is this?

This is a Quantitative Study study.

October 20, 2025Open Access

MSVIT: Improving Spiking Vision Transformer Using Multi-scale Attention Fusion

Key Points

MSVIT significantly outperforms existing spiking neural network models, showcasing its effectiveness in image analysis.
Experimental results demonstrate improvements in performance, bridging the gap between spiking neural networks and traditional architectures.
The use of multi-scale spiking attention enhances feature extraction across various image scales in spiking transformers.
The novel architecture provides a state-of-the-art solution in SNN-transformer hybrid models for efficient computing.

Abstract

The combination of Spiking Neural Networks (SNNs) with Vision Transformer architectures has garnered significant attention due to their potential for energy-efficient and high-performance computing paradigms. However, a substantial performance gap still exists between SNN-based and ANN-based transformer architectures. While existing methods propose spiking self-attention mechanisms that are successfully combined with SNNs, the overall architectures proposed by these methods suffer from a bottleneck in effectively extracting features from different image scales. In this paper, we address this issue and propose MSVIT. This novel spike-driven Transformer architecture firstly uses multi-scale spiking attention (MSSA) to enhance the capabilities of spiking attention blocks. We validate our approach across various main datasets. The experimental results show that MSVIT outperforms existing SNN-based models, positioning itself as a state-of-the-art solution among SNN-transformer architectures. The codes are available at https://github.com/Nanhu-AI-Lab/MSViT.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Hua et al. (Mon,) studied this question.

www.synapsesocial.com/papers/68f58f68ece7a5b64f471312 — DOI: https://doi.org/10.48550/arxiv.2505.14719

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

MSVIT: Improving Spiking Vision Transformer Using Multi-scale Attention Fusion· 2025 · 1 citations
SpikingResformer: Bridging ResNet and Vision Transformer in Spiking Neural Networks· 2024 · 2 citations
Vision Transformer with Sparse Scan Prior· 2024 · 2 citations
Spiking ViT: spiking neural networks with transformer—attention for steel surface defect classification· 2024 · 5 citations
SpikePool: Event-driven Spiking Transformer with Pooling Attention

Authors

Wei Hua

Chenlin Zhou

Jibin Wu

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

MSVIT: Improving Spiking Vision Transformer Using Multi-scale Attention Fusion

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion