September 1, 2024Open Access

Using Large Language Model for End-to-End Chinese ASR and NER

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

Mapping speech tokens to the same feature space as text tokens has become the paradigm for integrating speech modality into decoder-only large language models (LLMs). An alternative is to use an encoder-decoder architecture that incorporates speech features through cross-attention. In this work, we connect the Whisper encoder with ChatGLM3 and provide in-depth comparisons of these two approaches using Chinese automatic speech recognition (ASR) and named entity recognition (NER) tasks. We evaluate their performance using the F1 score and a fine-grained taxonomy of ASR-NER errors. Our experiments reveal that the encoder-decoder model outperforms the decoder-only model if the context is short, while the decoder-only model benefits from a long context as it fully exploits all layers of the LLM. Additionally, we obtain a state-of-the-art F1 score of 0.805 on the AISHELL-NER test set by using chain-of-thought NER which first infers long-form ASR transcriptions and then predicts NER labels.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Li et al. (Sun,) studied this question.

www.synapsesocial.com/papers/68e59d79b6db643587537935 — DOI: https://doi.org/10.21437/interspeech.2024-103

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Connecting Speech Encoder and Large Language Model for ASR· 2024 · 39 citations
Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets· 2024
Extending Large Language Models for Speech and Audio Captioning· 2024 · 11 citations
Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech Recognition· 2024 · 7 citations
Prompting Large Language Models with Speech Recognition Abilities

Authors

Yuang Li

Jiawei Yu

Min Zhang

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Using Large Language Model for End-to-End Chinese ASR and NER

Puntos clave

Resumen

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion