The efficiency of the proposed automatic speaker recognizer is evaluated using two speech databases. The feature vector consists of 21 mel-frequency cepstral coefficients (MFCCs), along with up to three additional features derived from the amplitude spectrum. The additional features are calculated based on the logarithm of the energy around the appropriate local maximum in the spectrum, the frequency of that maximum, and the logarithm of the energy of the maximum component in the spectrum across all frames of the observed signal. The speaker identification procedure for a closed set of speakers is tested on the Solo section of the CHAINS database and a speech database with expressed emotions, developed within the S-ADAPT project. The achieved maximum mean recognition accuracies are 97.11%, on the CHAINS database, using a feature vector of 21 MFCCs and two additional features, and 98.65% on neutral speech, as well as 98.72% on the entire database, for the S-ADAPT database, using a feature vector of 21 MFCCs.
Building similarity graph...
Analyzing shared references across papers
Loading...
Jokić et al. (Wed,) studied this question.
synapsesocial.com/papers/698586238f7c464f2300a1c9 — DOI: https://doi.org/10.2298/fuee2504663j
Ivan Jokić
University of Novi Sad
S. Jokić
Universitat Autònoma de Barcelona
Vlado Delić
University of Novi Sad
Facta universitatis - series Electronics and Energetics
University of Novi Sad
University of Nis
Telekom Srbija (Serbia)
Building similarity graph...
Analyzing shared references across papers
Loading...