January 14, 2026Open Access

Large language models for education: An open-source paradigm for automated Q&A in the graduate classroom

Key Points

Key points are not available for this paper at this time.

Abstract

Large Language Models (LLMs) offer scalable educational support, but face barriers regarding accuracy, cost, and learning depth. To interrogate these limitations, we developed the Teaching Assistant for Specialized Knowledge (TAsk), a retrieval-augmented generation enabled and educator curated pipeline. In this nine-week pilot study (N=33 participants), we deployed TAsk in a graduate-level biological chemistry course. We compared TAsk against human expert teaching assistants (TAs) using blinded review process and analyzed inquiry depth. We observed three major findings related to potential pedagogical decisions and educational theory. First, TAsk delivered effective feedback that was specific and adaptive as it significantly outperformed expert TAs in overall correctness. However, human TAs remained superior in tailoring responses to course nuances. Second, behavioral analysis based on educational scaffolding techniques, such as Bloom’s Taxonomy and the Zone of Proximal Development (ZPD), identified a cognitive bypass risk where frequent users submitted significantly fewer higher-order queries compared to infrequent users. Third, benchmarking demonstrated that smaller models could approach frontier model performance when optimized, suggesting future costs can be reduced significantly for TAsk in the pilot study. Finally, we validated a confabulation detection algorithm, hypothesizing that this algorithm could help students calibrate trust in model outputs in future iterations of TAsk. Taken together, these contributions establish TAsk as a validated framework for higher education learning while highlighting the critical need for pedagogical scaffolding for LLMs. • Introduces TAsk, a generalizable open-source pipeline for higher education. • Integrates LLMs with retrieval-augmented generation for question and answering. • Connects educational theories to the practical application of TAsk. • Finds that TAsk is on par or better with human TAs at question and answering. • Explores prompt design and safeguards against AI confabulation.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Ryann M. Perez

Marie Shimogawa

Yanan Chang

Journals

Computers and Education Artificial Intelligence

Actions

Institutions

University of Pennsylvania

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Large language models for education: An open-source paradigm for automated Q&A in the graduate classroom

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study