Forced alignment is widely used in phonetic research to align transcripts with acoustic signals. Yet there exists a lack of agreement on conventions for evaluating forced alignment, and our understanding of the reliability of forced aligners rests primarily on results from English. This study aims to fill these gaps by examining the concrete issue of forced aligning different Mandarin varieties. It evaluates machine-generated alignments from Montreal Forced Aligner (MFA); McAuliffe, Socolof, Mihuc, Wagner and Sonderegger Proc. Interspeech 2017, 498-502 (2017a) against two sets of independent human baselines using a Bayesian hierarchical multivariate regression model. Our findings suggest closer agreement between human aligners than between humans and MFA; large differences in alignment accuracy across different sequence types, with somewhat divergent patterns of errors across humans and MFA; some effects of speech rate and speaker-specific variation; and essentially no variation in robustness across varieties. These results serve (i) to reinforce previous results on the robustness of forced alignment across different varieties of the same language; and (ii) to provide a set of important methodological recommenations for evaluating forced alignment accuracy.
Building similarity graph...
Analyzing shared references across papers
Loading...
Suyuan Liu
Márton Sóskuthy
Sijia Zhang
The Journal of the Acoustical Society of America
University of British Columbia
Building similarity graph...
Analyzing shared references across papers
Loading...
Liu et al. (Wed,) studied this question.
www.synapsesocial.com/papers/69db37f94fe01fead37c6072 — DOI: https://doi.org/10.1121/10.0043323