What question did this study set out to answer?

The study aims to explore the effectiveness and limitations of large language models for content moderation in a classroom context.

March 25, 2026Open Access

Building an LLM-Powered Content Moderation Bot in the Classroom

Key Points

The study aims to explore the effectiveness and limitations of large language models for content moderation in a classroom context.
Developed Discord bots using large language models for content moderation in a Stanford course.
Conducted interviews with 16 students to assess the performance and challenges encountered with the bots.
LLMs demonstrated high accuracy in moderating content, often exceeding student expectations.
In disagreements between students and LLMs, the model's judgments were frequently validated upon closer analysis.
Students noted challenges including sensitivity to prompt phrasing and contextual interpretation issues.

Abstract

ABSTRACT As online platforms seek to improve content-moderation strategies, large language models (LLMs) may be a potential tool. This study examines opportunities and limitations of LLM-powered moderation through a unique lens: student projects for a Stanford University course titled Trust and Safety. In this course, students developed Discord bots using LLMs to moderate specific types of harmful content. Interviews with 16 of the students suggest that these models demonstrate high accuracy, often exceeding students’ expectations. Notably, in cases of disagreement between the student and the model, closer analysis frequently validated the model’s judgments. However, students also observed limitations: LLMs proved unhelpfully sensitive to prompt phrasing and exhibited many contextual interpretation challenges common to human moderators and traditional machine-learning classifiers.

Bookmark

View Full Paper

Cite This Study

Grossman et al. (Mon,) studied this question.

synapsesocial.com/papers/69c37b41b34aaaeb1a67d86c https://doi.org/https://doi.org/10.1017/s1049096526101929

Bookmark

View Full Paper