Artificial Intelligence (AI) in language assessment has gained attention due to its potential to minimize teachers’ complex task of assessing students’ writing. Although previous research has explored the use of technological tools to assess EFL learners’ writing, there is a need to further investigate how AI, particularly ChatGPT, can be used as an assessment tool in high-stakes writing assessment, and whether the scores provided by the AI are similar to those assigned by human raters. In the context of a high-stakes writing test at Universidad del Valle, this quantitative research investigated how the scorings of EFL university teachers differed from those given by ChatGPT when assessing EFL university learners’ written productions (personal opinion essay and data explanatory essay). Two argumentative writing tasks from two cohorts of ninth-semester EFL learners (N= 208) were used to compare the global and analytic ratings awarded by a pool of 20 human raters with those of ChatGPT using an analytic scoring rubric. The analytic dimensions included content, coherence and cohesion, sentence structure, grammar, and vocabulary. A total of 7072 scores were analyzed, including 416 global (208 human and 208 ChatGPT) and 6656 analytic scores across two writing tasks. Analytic scoring covered seven criteria for Task 1 and nine for Task 2, with both human raters and ChatGPT providing an equal number of ratings (3328 each). Data were analyzed on JASP, by using correlations and paired-samples t-test. According to the results, ChatGPT showed moderate agreement with human raters in surface-level dimensions such as grammar, vocabulary, and sentence structure.
Building similarity graph...
Analyzing shared references across papers
Loading...
Valentina Zapata Villano
Building similarity graph...
Analyzing shared references across papers
Loading...
Valentina Zapata Villano (Wed,) studied this question.