ABSTRACT As online platforms seek to improve content-moderation strategies, large language models (LLMs) may be a potential tool. This study examines opportunities and limitations of LLM-powered moderation through a unique lens: student projects for a Stanford University course titled Trust and Safety. In this course, students developed Discord bots using LLMs to moderate specific types of harmful content. Interviews with 16 of the students suggest that these models demonstrate high accuracy, often exceeding students’ expectations. Notably, in cases of disagreement between the student and the model, closer analysis frequently validated the model’s judgments. However, students also observed limitations: LLMs proved unhelpfully sensitive to prompt phrasing and exhibited many contextual interpretation challenges common to human moderators and traditional machine-learning classifiers.
Building similarity graph...
Analyzing shared references across papers
Loading...
Grossman et al. (Mon,) studied this question.
www.synapsesocial.com/papers/69c37b41b34aaaeb1a67d86c — DOI: https://doi.org/10.1017/s1049096526101929
Shelby Grossman
Anthony Mensah
Alex Stamos
PS Political Science & Politics
Stanford University
Arizona State University
Building similarity graph...
Analyzing shared references across papers
Loading...