This study examines cross-word coalescent assimilation of /t, d/ + /j/ (e.g., did you → dɪdʒə) in English by combining corpus analysis with experimental evidence from Japanese learners. From ICNALE monologues and dialogues, 240 tokens were extracted and coded through forced alignment and automated classification. Predictors included lexical frequency, collocational association, and phoneme-level surprisal, with speech rate, utterance position, and learner proficiency as additional covariates. Learnersʼ perception was tested with native-speaker recordings, and production was evaluated through a matched shadowing task. Results indicate strong frequency and association effects in L1, weaker but parallel effects in L2 production, and clear perceptual advantages for frequent collocations. The Discrepancy Index (perception minus production) was positive overall, capturing a consistent hear-over-say asymmetry that diminishes with higher proficiency. These findings extend usage-based accounts to categorical alternations and highlight pedagogical benefits of sequencing perception before production in teaching connected speech.
ISHIHARA et al. (Tue,) studied this question.