What question did this study set out to answer?

The research aims to enhance speech emotion recognition accuracy by integrating hybrid datasets and introducing a novel loss function.

April 17, 2026Open Access

Optimization of speech emotion recognition using hybrid dataset integration and deep learning-based feature fusion with a novel balanced focal entropy loss

Key Points

The research aims to enhance speech emotion recognition accuracy by integrating hybrid datasets and introducing a novel loss function.
Hybrid dataset combines three open-source speech datasets.
Data augmentation techniques like noise addition and pitch shifting are applied.
Four handcrafted feature extraction techniques are utilized for feature fusion.
Two deep learning models are trained using the fused features with the balanced focal entropy loss.
RAVDESS dataset achieved a test accuracy of 95.23%.
Hybrid dataset reached a maximum test accuracy of 97.80%.
The proposed method shows significant performance improvement compared to traditional approaches.

Abstract

Speech Emotion Recognition is essential for human-computer interaction, as it enables the automated system to recognize the emotion of the individual from the speech. In order to improve the quantity, diversity, and quality of the speech dataset that deep learning algorithms use to identify the person's emotions, this paper proposes a novel loss function, 'BalancedFocalEntropy', that is used to improve the performance of the unbalanced datasets by combining three separate open-source datasets. Two distinct datasets are used in the suggested method: the individual dataset and the hybrid dataset, which combines three distinct open-source datasets. The dataset is enhanced by the addition of noise, pitch shifting, and a combination of these techniques. Then, the four distinct handcrafted feature extraction techniques are employed to extract the features from the speech data, which are then combined to create a single set. These fused features are utilized to train the two deep learning models on the RAVDESS and hybrid datasets with the proposed loss function known as 'BalancedFocalEntropy loss'. The RAVDESS dataset achieves 95. 23% test accuracy, while the hybrid dataset reaches a maximum of 97. 80%. Real-time application of the suggested method can be useful in various domains to accurately identify an individual's emotion.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Shimaa Nagro

Journals

Scientific Reports

Actions

Institutions

Saudi Electronic University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Optimization of speech emotion recognition using hybrid dataset integration and deep learning-based feature fusion with a novel balanced focal entropy loss

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study