September 6, 2015

音声認識のための音声拡張

Key Points

Key points are not available for this paper at this time.

Abstract

データ拡張は、トレーニングデータの量を増やし、過学習を避け、モデルの頑健性を向上させるために広く採用されている戦略です。本論文では、生の信号を直接処理するオーディオレベルの音声拡張手法を検討します。特に推奨する方法は、オーディオ信号の速度を変えることで、速度係数が0.9、1.0、1.1の3つのバージョンの元の信号を生成するものです。提案手法は実装コストが低く、採用しやすいです。100時間から1000時間までのトレーニングデータ量を持つ4つの異なるLVCSRタスクで結果を示し、様々なデータシナリオにおける音声拡張の有効性を検証しました。4つのタスク全体で平均相対改善率4.3%が観察されました。

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Tom Ko

Vijayaditya Peddinti

Daniel Povey

Actions

Institutions

Johns Hopkins University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

音声認識のための音声拡張

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider