January 1, 1993Open Access

Distributional clustering of English words

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

We describe and evaluate experimentally a method for clustering words according to their distribution in particular syntactic contexts. Words are represented by the relative frequency distributions of contexts in which they appear, and relative entropy between those distributions is used as the similarity measure for clustering. Clusters are represented by average context distributions derived from the given words according to their probabilities of cluster membership. In many cases, the clusters can be thought of as encoding coarse sense distinctions. Deterministic annealing is used to find lowest distortion sets of clusters: as the annealing parameter increases, existing clusters become unstable and subdivide, yielding a hierarchical "soft" clustering of the data. Clusters are used as the basis for class models of word coocurrence, and the models evaluated with respect to held-out test data.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Pereira et al. (Fri,) studied this question.

www.synapsesocial.com/papers/6a08e977afc616802fe4b7ce — DOI: https://doi.org/10.3115/981574.981598

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Stochastic lexicalized tree-adjoining grammars· 1992 · 134 citations
Elements of Information Theory· 2001 · 37,896 citations
A stochastic parts program and noun phrase parser for unrestricted text· 1988 · 974 citations
Pattern classification and scene analysis· 1973 · 12,658 citations
Pattern Classification and Scene Analysis· 1974 · 4,511 citations

Authors

Fernando Pereira

Naftali Tishby

Lillian Lee

Actions

Institutions

Cornell University

Hebrew University of Jerusalem

AT&T (United States)

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Distributional clustering of English words

Puntos clave

Resumen

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion