Music genre recognition is crucial for organization, content recommendation, and retrieval applications. Existing techniques often have trouble with multi-dimensional and capturing dynamic features under varying ambient conditions. A novel Hybrid Deep Autoencoders and Wavelet Networks (HDAWN) framework has been developed to address these robust music genre recognition limitations. Autoencoders spontaneously extract optimal feature combinations, whereas wavelet networks identify periodic frequency characteristics adaptable to diverse scenarios. Hierarchical Recurrent Deep Autoencoders (HRDAE) also capture temporal patterns in the input spectrogram, enhancing the system’s temporal feature representation. Symmetric and asymmetric wavelet groups optimize wavelet selection using the George Tzanetakis (GTZAN) dataset to compare comprehensively. Evaluated on GTZAN dataset, the proposed approach gets 94.55% classification accuracy, which is 13.88 percentage points higher than the performance of conventional DNN-SVM baselines (80.67%), and 1∼4% higher than several recent deep learning approaches. In addition, the framework has a low computational complexity, which is only 0.03 GMAC, which is ∼6∼60 times less than deeper CNN-based architectures such as CRNN-9 (0.27 GMAC) and ResNet18 + 3D (1.79 GMAC). These results show that the proposed hybrid architecture is able to simultaneously enhance the classification accuracy and computational efficiency, which is a good trade-off between the performance and resource consumption.
Jie et al. (Thu,) studied this question.