May 1, 2008

Speaker identification on the SCOTUS corpus

Key Points

Key points are not available for this paper at this time.

Abstract

This paper reports the results of our experiments on speaker identification in the SCOTUS corpus, which includes oral arguments from the Supreme Court of the United States. Our main findings are as follows: 1) a combination of Gaussian mixture models and monophone HMM models attains near-100% text-independent identification accuracy on utterances that are longer than one second; (2) the sampling rate of 11025 Hz achieves the best performance (higher sampling rates are harmful); and a sampling rate as low as 2000 Hz still achieves more than 90% accuracy; (3) a distance score based on likelihood numbers was used to measure the variability of phones among speakers; we found that the most variable phone is the phone UH (as in good), and the velar nasal NG is more variable than the other two nasal sounds M and N; 4.) our models achieved “perfect” forced alignment on very long speech segments (one hour). These findings and their significance are discussed.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Jiahong Yuan

Mark Liberman

Journals

The Journal of the Acoustical Society of America

Actions

Institutions

University of Pennsylvania

Williams (United States)

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Speaker identification on the SCOTUS corpus

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider