Fluency is a central dimension of L2 oral proficiency. Further, fluency assessment is important for many applied contexts, including pedagogical and assessment purposes. Yet, the measurement of fluency using manual annotation is labor-intensive, which limits its broad application and scalability. We evaluate two automated tools — an acoustic-based tool (de Jong et al., 2021) and a machine-learning tool (Matsuura et al., 2025) — using data from L1-Chinese learners of English. Accuracy was assessed for three metrics, articulation rate (AR), pause ratio (PR), and mean pause duration (MPD), via Pearson correlations with manual annotation. We compared two automated tools and tested whether targeted manual post-processing (TextGrid checks and transcript adjustments) improves metric extraction using Steiger’s test. Results from our sample indicated that de Jong et al. (2021) yielded higher accuracy for silence-based metrics (PR, MPD). However, text-dependent metrics (syllable number after removing disfluency words in AR) benefited from corrected TextGrids (for the acoustic tool) or corrected transcripts (for the machine-learning tool). These findings suggest a scalable division of labor: use an acoustic-based tool for silence-driven metrics, and apply corrected transcripts with a machine-learning tool when extracting text-sensitive metrics.
Lu et al. (Fri,) studied this question.