What question did this study set out to answer?

The research aims to assess the accuracy of forced alignment in different Mandarin varieties and establish evaluation standards.

April 12, 2026

Best practices in evaluating forced-alignment accuracy: The case of Mandarin varieties

Key Points

The research aims to assess the accuracy of forced alignment in different Mandarin varieties and establish evaluation standards.
Evaluated machine-generated alignments from Montreal Forced Aligner (MFA).
Compared alignments against two sets of human baselines.
Used a Bayesian hierarchical multivariate regression model for analysis.
Examined differences in alignment accuracy across various sequence types.
Found closer agreement between human aligners compared to humans and MFA.
Identified large accuracy differences across different alignment sequences.
Noticed effects of speech rate and speaker variability.
Observed no variation in accuracy across different Mandarin varieties.

Abstract

Forced alignment is widely used in phonetic research to align transcripts with acoustic signals. Yet there exists a lack of agreement on conventions for evaluating forced alignment, and our understanding of the reliability of forced aligners rests primarily on results from English. This study aims to fill these gaps by examining the concrete issue of forced aligning different Mandarin varieties. It evaluates machine-generated alignments from Montreal Forced Aligner (MFA); McAuliffe, Socolof, Mihuc, Wagner and Sonderegger Proc. Interspeech 2017, 498-502 (2017a) against two sets of independent human baselines using a Bayesian hierarchical multivariate regression model. Our findings suggest closer agreement between human aligners than between humans and MFA; large differences in alignment accuracy across different sequence types, with somewhat divergent patterns of errors across humans and MFA; some effects of speech rate and speaker-specific variation; and essentially no variation in robustness across varieties. These results serve (i) to reinforce previous results on the robustness of forced alignment across different varieties of the same language; and (ii) to provide a set of important methodological recommenations for evaluating forced alignment accuracy.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Suyuan Liu

Márton Sóskuthy

Sijia Zhang

Journals

The Journal of the Acoustical Society of America

Actions

Institutions

University of British Columbia

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Best practices in evaluating forced-alignment accuracy: The case of Mandarin varieties

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study