Key points are not available for this paper at this time.
Large Language Models (LLMs) offer scalable educational support, but face barriers regarding accuracy, cost, and learning depth. To interrogate these limitations, we developed the Teaching Assistant for Specialized Knowledge (TAsk), a retrieval-augmented generation enabled and educator curated pipeline. In this nine-week pilot study (N=33 participants), we deployed TAsk in a graduate-level biological chemistry course. We compared TAsk against human expert teaching assistants (TAs) using blinded review process and analyzed inquiry depth. We observed three major findings related to potential pedagogical decisions and educational theory. First, TAsk delivered effective feedback that was specific and adaptive as it significantly outperformed expert TAs in overall correctness. However, human TAs remained superior in tailoring responses to course nuances. Second, behavioral analysis based on educational scaffolding techniques, such as Bloom’s Taxonomy and the Zone of Proximal Development (ZPD), identified a cognitive bypass risk where frequent users submitted significantly fewer higher-order queries compared to infrequent users. Third, benchmarking demonstrated that smaller models could approach frontier model performance when optimized, suggesting future costs can be reduced significantly for TAsk in the pilot study. Finally, we validated a confabulation detection algorithm, hypothesizing that this algorithm could help students calibrate trust in model outputs in future iterations of TAsk. Taken together, these contributions establish TAsk as a validated framework for higher education learning while highlighting the critical need for pedagogical scaffolding for LLMs. • Introduces TAsk, a generalizable open-source pipeline for higher education. • Integrates LLMs with retrieval-augmented generation for question and answering. • Connects educational theories to the practical application of TAsk. • Finds that TAsk is on par or better with human TAs at question and answering. • Explores prompt design and safeguards against AI confabulation.
Building similarity graph...
Analyzing shared references across papers
Loading...
Ryann M. Perez
Marie Shimogawa
Yanan Chang
Computers and Education Artificial Intelligence
University of Pennsylvania
Building similarity graph...
Analyzing shared references across papers
Loading...
Perez et al. (Wed,) studied this question.
www.synapsesocial.com/papers/6a07fb96c4a3eaa040fe09bc — DOI: https://doi.org/10.1016/j.caeai.2026.100546