September 6, 2015

Audio augmentation for speech recognition

Key Points

Key points are not available for this paper at this time.

Abstract

Data augmentation is a common strategy adopted to increase the quantity of training data, avoid overfitting and improve robustness of the models. In this paper, we investigate audio-level speech augmentation methods which directly process the raw signal. The method we particularly recommend is to change the speed of the audio signal, producing 3 versions of the original signal with speed factors of 0.9, 1.0 and 1.1. The proposed technique has a low implementation cost, making it easy to adopt. We present results on 4 different LVCSR tasks with training data ranging from 100 hours to 1000 hours, to examine the effectiveness of audio augmentation in a variety of data scenarios. An average relative improvement of 4.3% was observed across the 4 tasks.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Ko et al. (Sun,) studied this question.

www.synapsesocial.com/papers/69fb895c6d730ca589dd5ba1 — DOI: https://doi.org/10.21437/interspeech.2015-711

Authors

Tom Ko

Vijayaditya Peddinti

Daniel Povey

Actions

Institutions

Johns Hopkins University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Audio augmentation for speech recognition

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Also consider