Key points are not available for this paper at this time.
Automatically assessing classroom discussion quality is becoming increasingly feasible with the help of new NLP advancements such as large language models (LLMs). In this work, we examine how the assessment performance of 2 LLMs interacts with 3 factors that may affect performance: task formulation, context length, and few-shot examples. We also explore the computational efficiency and predictive consistency of the 2 LLMs. Our results suggest that the 3 aforementioned factors do affect the performance of the tested LLMs and there is a relation between consistency and performance. We recommend a LLM-based assessment approach that has a good balance in terms of predictive performance, computational efficiency, and consistency.
Building similarity graph...
Analyzing shared references across papers
Loading...
Tran et al. (Wed,) studied this question.
www.synapsesocial.com/papers/68e651bbb6db6435875e191b — DOI: https://doi.org/10.48550/arxiv.2406.08680
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:
Nhat Tran
Benjamin C. Pierce
Diane Litman
Building similarity graph...
Analyzing shared references across papers
Loading...