What question did this study set out to answer?

The aim is to evaluate the impact of Donald Trump’s 2024–2025 campaign on hate speech dynamics on Reddit.

March 1, 2026

Large-Scale Hate Speech Analysis with BERT and HDBSCAN: Tracking Political Influence Across 55M Reddit Posts

Key Points

The aim is to evaluate the impact of Donald Trump’s 2024–2025 campaign on hate speech dynamics on Reddit.
Analyzed over 55 million posts from Reddit during pre- and mid-campaign periods.
Employed a fine-tuned BERT model for multi-class classification of hate speech.
Utilized HDBSCAN for clustering hate speech targets based on identity.
Implemented a post-processing algorithm to refine identification of slurs.
Observed a significant increase in hate speech volume during the campaign.
Noted shifts in targeted identity clusters, especially towards ideological, gendered, and racial groups.
Contributed empirical findings to discussions on algorithmic content moderation.

Abstract

The interplay between political campaigning and online discourse has emerged as a critical area of computational social science, particularly concerning the propagation of hate speech. This study investigates how Donald Trump’s 2024–2025 U.S. presidential campaign influenced the prevalence and thematic focus of hate speech on Reddit, a platform known for its politically engaged communities. Leveraging a dataset of over 55 million Reddit posts spanning two key periods (pre- and mid-campaign), we develop a large-scale Natural Language Processing (NLP) methodology to detect and analyse the dynamics of hate speech. Hate speech is identified using a fine-tuned BERT model optimized for multi-class classification, distinguishing hateful, offensive, and neutral content. To explore the structure of hate speech targets, we apply Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN), a density-based clustering algorithm that accommodates variable data distributions in high-dimensional embeddings. A post-processing algorithm further refines overlapping categories by decomposing identity-based slurs into distinct subgroups. Our findings reveal a marked increase in hate speech volume during the campaign, along with significant shifts in targeted identity clusters, particularly toward ideological, gendered, and racial groups. These results contribute empirical evidence to debates around algorithmic content moderation, political communication, and the computational modeling of social polarization.

Bookmark

Large-Scale Hate Speech Analysis with BERT and HDBSCAN: Tracking Political Influence Across 55M Reddit Posts

Key Points

Abstract

Cite This Study