What question did this study set out to answer?

The aim is to enhance word sense induction by incorporating definition-driven approaches to identify word meanings accurately.

April 15, 2026Open Access

Definition-Anchored Unsupervised Word Sense Induction Using LLM-Generated Glosses

Key Points

The aim is to enhance word sense induction by incorporating definition-driven approaches to identify word meanings accurately.
Proposed a definition-anchored reclassification framework for WSI.
Leveraged large language models to generate explicit sense descriptions.
Shifted from geometric clustering to definition-based semantic matching.
Applied the method to SemEval-2010 and SemEval-2013 datasets.
Consistently outperformed traditional clustering baselines and existing WSI systems.
Improved instance-level alignment while addressing dominant-sense bias.
Enhanced recovery of minority senses by keeping them as distinct clusters.
Achieved higher scores in structural metrics (NMI, V-measure) and instance-level metrics (F-B3, Fuzzy-F-B3).

Abstract

Word sense induction (WSI) aims to automatically discover the different senses of a word from contextual usage without predefined sense inventories. However, existing distributional clustering methods often suffer from dominant-sense bias and struggle to correctly identify minority senses. In this paper, we propose a definition-anchored reclassification framework for WSI that leverages large language models (LLMs) to generate explicit sense descriptions and refine cluster assignments. Unlike purely distributional approaches, our method integrates semantic definitions into the induction process. Our method improves instance-level alignment by introducing a trade-off with global structural consistency, as it shifts the decision process from geometric clustering to definition-based semantic matching. Experiments on the SemEval-2010 and SemEval-2013 datasets demonstrate that the proposed method consistently outperforms traditional clustering baselines and existing WSI systems across both structural metrics (NMI and V-measure) and instance-level metrics (F-B3 and Fuzzy-F-B3). In particular, our approach effectively mitigates dominant-sense bias and improves the recovery of minority senses by preserving them as distinct clusters while correctly assigning their instances. These results suggest that explicit semantic representations generated by LLMs provide a promising direction for addressing long-standing challenges in unsupervised word sense induction. Furthermore, unlike purely distributional clustering approaches, our method explicitly introduces LLM-generated semantic definitions as anchors, enabling more robust mitigation of dominant-sense bias and improved recall of minority senses.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Shota Yoshikawa

Minoru Sasaki

Journals

Applied Sciences

Actions

Institutions

Ibaraki University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Definition-Anchored Unsupervised Word Sense Induction Using LLM-Generated Glosses

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study