What does this research mean for the field?

Integrating latent semantic analysis with standard n-gram language models to capture long-term global semantic dependencies reduces the average word error rate in speech recognition by over 20%. Novelty: ClaimNovelty.METHODOLOGICAL. Consensus alignment: ConsensusAlignment.NEUTRAL.

August 1, 2000

Exploiting latent semantic information in statistical language modeling

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

Statistical language models used in large-vocabulary speech recognition must properly encapsulate the various constraints, both local and global, present in the language. While local constraints are readily captured through n-gram modeling, global constraints, such as long-term semantic dependencies, have been more difficult to handle within a data-driven formalism. This paper focuses on the use of latent semantic analysis, a paradigm that automatically uncovers the salient semantic relationships between words and documents in a given corpus. In this approach, (discrete) words and documents are mapped onto a (continuous) semantic vector space, in which familiar clustering techniques can be applied. This leads to the specification of a powerful framework for automatic semantic classification, as well as the derivation of several language model families with various smoothing properties. Because of their large-span nature, these language models are well suited to complement conventional n-grams. An integrative formulation is proposed for harnessing this synergy, in which the latent semantic information is used to adjust the standard n-gram probability. Such hybrid language modeling compares favorably with the corresponding n-gram baseline: experiments conducted on the Wall Street Journal domain show a reduction in average word error rate of over 20%. This paper concludes with a discussion of intrinsic tradeoffs, such as the influence of training data selection on the resulting performance.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

J.R. Bellegarda (Tue,) studied this question.

www.synapsesocial.com/papers/6a0968d3b0d552aa8b45ab1b — DOI: https://doi.org/10.1109/5.880084

Authors

J.R. Bellegarda

Journals

Proceedings of the IEEE

Actions

Institutions

Apple (United States)

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Exploiting latent semantic information in statistical language modeling

Puntos clave

Resumen

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion