September 1, 2024Open Access

Exploring the Capability of Mamba in Speech Applications

Key Points

Key points are not available for this paper at this time.

Abstract

This paper explores the capability of Mamba, a recently proposed architecture based on state space models (SSMs), as a competitive alternative to Transformer-based models. In the speech domain, well-designed Transformer-based models, such as the Conformer and E-Branchformer, have become the de facto standards. Extensive evaluations have demonstrated the effectiveness of these Transformer-based models across a wide range of speech tasks. In contrast, the evaluation of SSMs has been limited to a few tasks, such as automatic speech recognition (ASR) and speech synthesis. In this paper, we compared Mamba with state-of-the-art Transformer variants in various speech applications, including ASR, text-to-speech, spoken language understanding, and speech summarization. Experimental evaluations revealed that Mamba achieves comparable or better performance than Transformer-based models, and demonstrated its efficiency in long-form speech processing.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Miyazaki et al. (Sun,) studied this question.

www.synapsesocial.com/papers/68e59c4cb6db643587536976 — DOI: https://doi.org/10.21437/interspeech.2024-994

Authors

Koichi Miyazaki

Yoshiki Masuyama

Masato Murata

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Exploring the Capability of Mamba in Speech Applications

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion