March 3, 2026

AI Meets Academic Writing: ChatGPT's Impact on L2 accuracy and lexical diversity

Key Points

Revisions made using ChatGPT resulted in increased lexical diversity among learners' texts, showcasing significant engagement with vocabulary improvements.
Despite utilizing ChatGPT, written revisions still contained numerous errors categorized under L2 accuracy, indicating a gap in learner focus on precise language use.
Assessment utilized the Louvain Error Tagging framework to annotate errors across 182 learner texts, exploring linguistic shifts due to AI feedback-driven revisions.
Findings highlight the importance of fostering AI-literacy and feedback literacy skills among ESL learners for better writing outcomes.

Abstract

Generative artificial intelligence (gen-AI) is increasingly being integrated into language education, with language teachers seeking to capitalise on its strengths, while mitigating the threats it poses (Li et al. 2024; Liu et al. 2024a). Although user attitudes towards ChatGPT are frequently investigated, how the tool empirically impacts actual L2 linguistic performance is still largely unknown (some exceptions include, e.g. Mizumoto et al. 2024, Pfau et al. 2023). To partially fill this gap, the current paper compares English as a Second Language Learners’ (ESL) written performance in two different contexts, namely (1) academic papers written in class without the help of ChatGPT and (2) the same papers subsequently revised with the help of ChatGPT. Specifically, we aim to capture which linguistic changes proposed by ChatGPT the learners decided to implement in their writing in the following two CAF areas: L2 accuracy and lexical diversity. These constructs represent concretely operationalisable features which will enable insights into learners’ revision decisions. Importantly, the study is not intended as an assessment of ChatGPT’s rewriting quality, but rather aims to better understand learner writing revision behaviour with GenAI so as to pedagogically target learner feedback literacy skills. The participants (N=91) are L1 Dutch-speaking ESL students in the first bachelor of English literature and linguistics at the University of Antwerp, Belgium. This study is mainly based on empirical learner written data: 91 in-class assignments produced without ChatGPT are compared with the corresponding home revisions produced with ChatGPT assistance, thus totalling a database of 182 learner texts, each of which is c.500 words long (total tokens in the learner database = c. 91,000). To investigate the extent to which ChatGPT feedback impacts L2 accuracy, each learner text was annotated for errors by using Louvain Error Tagging error taxonomy (Granger et al. 2022) (c.50-plus error types). Lexical diversity (LD) was automatically calculated using the opensource Tool for the Automatic Analysis of Lexical Diversity (TAALED) (Kyle et al. 2021) and was operationalised with the textual lexical diversity measure (MTLD) (McCarthy & Jarvis 2010) and the Moving-Average Type-Token Ratio (MATTR) (Covington & McFall 2010), two measures which have been found to be largely independent of text-length. Effect sizes and other relevant statistics (descriptives and inferential statistics) shed light on the practical impact of ChatGPT feedback in L2 learner writing. Our results show unexpected trends, especially concerning accuracy as the texts revised with ChatGPT still include numerous errors in all categories. In other words, students did not seem to be particularly “accuracyoriented” when carrying out text revisions and did not leverage ChatGPT to markedly tend to that aspect of their writing. More expectedly, lexical diversity was shown to be much higher in following ChatGPT feedback, with passages that were very much learner atypical. Concerning the relationship between accuracy and LD, no strong correlation emerged between the two, suggesting that these are two separate constructs: students deal with revisions on lexicon and accuracy in different ways. These findings also shed light on the need to develop students’ AI-literacy, but also their feedback literacy if they are to use gen-AI feedback in a meaningful way to improve their writing. References Brezina, V., & Platt, W. (2024). #LancsBox X software, Lancaster University, http://lancsbox.lancs.ac.uk Covington, M.A., & McFall, J.D. (2010). Cutting the Gordian Knot: The Moving-Average Type–Token Ratio (MATTR). Journal of Quantitative Linguistics, 17, 100–94. Granger, S., Swallow, H., & Thewissen, J. (2022). The Louvain Error tagging Manual. Version 2.0. CECL Papers 4. Centre for English Corpus Linguistics/Université catholique de Louvain. Kyle, K., Crossley, S. A., & Jarvis, S. (2021). Assessing the validity of lexical diversity using direct judgements. Language Assessment Quarterly, 18(2), 154–170. https://doi.org/10.1080/15434303.2020.1844205 Li, B., Lowell, V. L., Wang, C., & Li, X. (2024). A systematic review of the first year of publications on ChatGPT and language education: Examining research on ChatGPT’s use in language learning and teaching. Computers and Education: Artificial Intelligence, 100266. 56 Liu, J., Wang, C., Liu, Z., Gao, M., Xu, Y., Chen, J., & Cheng, Y. (2024a). A bibliometric analysis of generative AI in education: Current status and development. Asia Pacific Journal of Education, 44(1), 156–175. Liu, Y., Park, J., & McMinn, S. (2024b). Using generative artificial intelligence/ChatGPT for academic communication: Students' perspectives. International Journal of Applied Linguistics, 34, 1437–1461. Marzuki, Widiati, U., Rusdin, D., Darwin, & Indrawati, I. (2023). The impact of AI writing tools on the content and organization of students’ writing: EFL teachers’ perspective. Cogent Education, 10(2), 2236469. McCarthy, P.M., & Jarvis, S. (2010). MTLD, vocd-D, and HD-D: a validation study of sophisticated approaches to lexical diversity assessment. Behavioural Research Methods, 42(2), 381–392. Meyer, J., Jansen, T., Schiller, R., Liebenow, L. W., Steinbach, M., Horbach, A., & Fleckenstein, J. (2024). Using LLMs to bring evidence-based feedback into the classroom: AI-generated feedback increases secondary students’ text revision, motivation, and positive emotions. Computers and Education: Artificial Intelligence, 6, 100199. Mizumoto, A., Shintani, N., Sasaki, M., & Feng Teng, M. (2024). Testing the viability of ChatGPT as a companion in L2 writing accuracy assessment. Research Methods in Applied Linguistics, 3(2). https://doi.org/10.1016/j.rmal.2024.100116 Pfau, A., Polio, C., & Xu, Y. (2023). Exploring the potential of ChatGPT in assessing L2 writing accuracy for research purposes. Research Methods in Applied Linguistics, 2(3), 100083. 10.1016/j.rmal.2023.100083

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Jennifer Thewissen

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

AI Meets Academic Writing: ChatGPT's Impact on L2 accuracy and lexical diversity

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study