December 1, 1992

Class-based n -gram models of natural language

Key Points

Key points are not available for this paper at this time.

Abstract

We address the problem of predicting a word from previous words in a sample of text. In particular, we discuss n-gram models based on classes of words. We also discuss several statistical algorithms for assigning words to classes based on the frequency of their cooccurrence with other words. We find that we are able to extract classes that have the flavor of either syntactically based groupings or semantically based groupings, depending on the nature of the underlying statistics. 1 Introduction In a number of natural language processing tasks, we face the problem of recovering a string of English words after it has been garbled by passage through a noisy channel. To tackle this problem successfully, we must be able to estimate the probability with which any particular string of English words will be presented as input to the noisy channel. In this paper, we discuss a method for making such estimates. We also discuss the related topic of assigning words to classes according to statisti...

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Brown et al. (Tue,) studied this question.

www.synapsesocial.com/papers/6a07a07b047d6f4f368b37e7 — DOI: https://doi.org/10.5555/176313.176316

Authors

Peter F. Brown

P.V. deSouza

Robert L. Mercer

Journals

Computational Linguistics

Actions

Institutions

IBM (United States)

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Class-based n -gram models of natural language

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion