Current research on the interplay between corpus linguistics and generative AI typically gravitates toward testing the instrumental value of the technology for the discipline, as LLMs begin to populate toolkits (Anthony 2023) and mimic established approaches to language analysis (Curry et al., 2024). The present contribution delves into a less explored area, showing how corpus methods can inform the investigation of AI-generated language, notably in the context of dialogue-based Computer-Assisted Language Learning. Multiple studies advocate for the use of conversational AI in the language classroom (Bibauw et al., 2022), and yet direct observation of students’ interactions with these tools is lacking (Han 2024). To tackle these gaps, we present a novel learner corpus design intended to explore emerging features of human-machine written interaction data. Our corpus consists of 326 interactions (722,537 tokens) by as many Italian university students aged 19-25 , with diverse proficiency levels (mostly low-to-upper-intermediate) and including learners with disabilities and learning disorders, to favour equal access to learning opportunities (CAST 2018). The interactions were collected based on a protocol involving two LLM-based chatbots (ChatGPT and Pi.AI) and EFL learning scenarios (small talk and roleplay). In addition to introducing the corpus annotation scheme, we present a case study investigating both sides of learner-chatbot interactions. First, we provide quantitative and qualitative evidence of learners’ errors in open-ended and task-oriented conversations, annotated following an adapted version of the Louvain Error Tagging Manual (Granger et al., 2022) that includes new tags tailored to digital communication. Second, following up on Cervini and Paone’s (2024) classification of intercomprehension strategies, we leverage corpus methods to evaluate LLMs’ responses to those errors, with an eye towards analysing the interaction mechanisms of Generative AI and its potential for language development. References Anthony, L. (2023). Corpus AI: Integrating Large Language Models (LLMs) into a Corpus Analysis Toolkit. Presentation given at the 49th Annual Conference of the Japan Association for English Corpus Studies (JAECS), Kansai University, Osaka, Japan. Available at https://osf.io/srtyd/. Bibauw, S., W. Van Den Noortgate, T. François and P. Desmet (2022). "Dialogue systems for language learning: a meta-analysis". Language Learning & Technology, 26(1). CAST. 2018. Universal Design for Learning guidelines. http://udlguidelines.cast.org Cervini, C., & Paone, E. (2024). Comunicare all’università: quando l’interazione orale si fa plurilingue. Italiano LinguaDue, 16(2), 496-523. Curry, N., Baker, P., & Brookes, G. (2024). Generative AI for corpus approaches to discourse studies: A critical evaluation of ChatGPT. Applied Corpus Linguistics, 4(1), 100082. Granger, S., Swallow, H., & Thewissen, J. (2022). The Louvain error tagging manual. Version 2.0. Han, Z. (2024). “Chatgpt in and for second language acquisition: a call for systematic research”. Studies in Second Language Acquisition, 46(2), 301-306.
Building similarity graph...
Analyzing shared references across papers
Loading...
Adriano Ferraresi
Daniele Polizzi
University of Bologna
Building similarity graph...
Analyzing shared references across papers
Loading...
Ferraresi et al. (Wed,) studied this question.
www.synapsesocial.com/papers/69b25b7196eeacc4fceca3db — DOI: https://doi.org/10.5281/zenodo.18945547