Key points are not available for this paper at this time.
This paper reports the results of our experiments on speaker identification in the SCOTUS corpus, which includes oral arguments from the Supreme Court of the United States. Our main findings are as follows: 1) a combination of Gaussian mixture models and monophone HMM models attains near-100% text-independent identification accuracy on utterances that are longer than one second; (2) the sampling rate of 11025 Hz achieves the best performance (higher sampling rates are harmful); and a sampling rate as low as 2000 Hz still achieves more than 90% accuracy; (3) a distance score based on likelihood numbers was used to measure the variability of phones among speakers; we found that the most variable phone is the phone UH (as in good), and the velar nasal NG is more variable than the other two nasal sounds M and N; 4.) our models achieved “perfect” forced alignment on very long speech segments (one hour). These findings and their significance are discussed.
Building similarity graph...
Analyzing shared references across papers
Loading...
Jiahong Yuan
Mark Liberman
The Journal of the Acoustical Society of America
University of Pennsylvania
Williams (United States)
Building similarity graph...
Analyzing shared references across papers
Loading...
Yuan et al. (Thu,) studied this question.
www.synapsesocial.com/papers/6a08ea2d27ceb0c2a2d61a2c — DOI: https://doi.org/10.1121/1.2935783
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: