What does this research mean for the field?

Analyzing collocational divergence through 'surprise words' reveals differences in discourse framing across corpora that are not identified by traditional keyword frequency analysis alone. Novelty: ClaimNovelty.METHODOLOGICAL. Consensus alignment: ConsensusAlignment.NEUTRAL.

June 1, 2026Open Access

Surprise Words and Core Words: Collocational Divergence Across Corpora

Key Points

Key points are not available for this paper at this time.

Abstract

This paper introduces a method for identifying words whose collocational behaviour differs across corpora. While keyword analysis identifies differences in relative frequency, it does not systematically identify words that are used in different contexts across corpora. To address this gap, this paper proposes the concept of collocationally divergent words, or surprise words (s-words), defined as words whose collocational profiles strongly differbetween two corpora. A related term - core words (c-words) are those which show high collocational similarity between two corpora. A list of divergence scores for all candidate words are calculated by comparing the similarities between the top collocates of each word across corpora. In order to implement this concept, this paper also describes the development of the S-Word Analysis Tool (SWAT), created using the Large Language Model ChatGPT. The tool automatically identifies and ranks words according to their collocational divergence and provides collocate lists and concordances to enable qualitative analyses to be carried out. A case study comparing representations of obesity in The Guardian and The Sun shows that s-words reveal differences in discourse framing not identified through keyword analysis alone, particularly in relation to responsibility, cost and solutions. The paper argues that s-word analysis provides a complementary method for corpus comparison and demonstrates the potential of large language models for developing new corpus linguistic tools and methods.

Surprise Words and Core Words: Collocational Divergence Across Corpora

Key Points

Abstract

Cite This Study