Key points are not available for this paper at this time.
This paper explores the capability of Mamba, a recently proposed architecture based on state space models (SSMs), as a competitive alternative to Transformer-based models. In the speech domain, well-designed Transformer-based models, such as the Conformer and E-Branchformer, have become the de facto standards. Extensive evaluations have demonstrated the effectiveness of these Transformer-based models across a wide range of speech tasks. In contrast, the evaluation of SSMs has been limited to a few tasks, such as automatic speech recognition (ASR) and speech synthesis. In this paper, we compared Mamba with state-of-the-art Transformer variants in various speech applications, including ASR, text-to-speech, spoken language understanding, and speech summarization. Experimental evaluations revealed that Mamba achieves comparable or better performance than Transformer-based models, and demonstrated its efficiency in long-form speech processing.
Building similarity graph...
Analyzing shared references across papers
Loading...
Miyazaki et al. (Sun,) studied this question.
www.synapsesocial.com/papers/68e59c4cb6db643587536976 — DOI: https://doi.org/10.21437/interspeech.2024-994
Koichi Miyazaki
Yoshiki Masuyama
Masato Murata
Building similarity graph...
Analyzing shared references across papers
Loading...