Speech Emotion Recognition is essential for human-computer interaction, as it enables the automated system to recognize the emotion of the individual from the speech. In order to improve the quantity, diversity, and quality of the speech dataset that deep learning algorithms use to identify the person's emotions, this paper proposes a novel loss function, 'BalancedFocalEntropy', that is used to improve the performance of the unbalanced datasets by combining three separate open-source datasets. Two distinct datasets are used in the suggested method: the individual dataset and the hybrid dataset, which combines three distinct open-source datasets. The dataset is enhanced by the addition of noise, pitch shifting, and a combination of these techniques. Then, the four distinct handcrafted feature extraction techniques are employed to extract the features from the speech data, which are then combined to create a single set. These fused features are utilized to train the two deep learning models on the RAVDESS and hybrid datasets with the proposed loss function known as 'BalancedFocalEntropy loss'. The RAVDESS dataset achieves 95. 23% test accuracy, while the hybrid dataset reaches a maximum of 97. 80%. Real-time application of the suggested method can be useful in various domains to accurately identify an individual's emotion.
Building similarity graph...
Analyzing shared references across papers
Loading...
Shimaa Nagro
Scientific Reports
Saudi Electronic University
Building similarity graph...
Analyzing shared references across papers
Loading...
Shimaa Nagro (Tue,) studied this question.
www.synapsesocial.com/papers/69e1cdc45cdc762e9d857072 — DOI: https://doi.org/10.1038/s41598-026-48975-5