What question did this study set out to answer?

The research aims to improve English vocabulary pronunciation evaluation by utilizing advanced audio features and machine learning techniques.

March 15, 2026Open Access

An English vocabulary pronunciation evaluation model based on multidimensional audio features and machine learning

Puntos clave

The research aims to improve English vocabulary pronunciation evaluation by utilizing advanced audio features and machine learning techniques.
Designed a multi-dimensional audio feature extraction algorithm using multi-scale dilated convolution.
Constructed a shallow feature refinement module with parallel convolutions for capturing three-dimensional features.
Implemented a global feature fusion module with multiplicative gating mechanisms for cross-scale feature fusion.
Used a differential evolution algorithm optimized support vector machine to score multi-dimensional features.
Achieved an average evaluation accuracy of 94.57%.
Outperformed comparative models in pronunciation assessment accuracy.
Provided a more objective and accurate evaluation of English vocabulary pronunciation.

Resumen

In response to the issue where current English vocabulary pronunciation evaluation models cannot fully extract feature information from different dimensions of spectrograms, this paper first designs a multi-dimensional audio feature extraction algorithm based on multi-scale dilated convolution.This algorithm initially constructs a shallow feature refinement module that uses parallel convolutions to capture time, frequency, and time-frequency three-dimensional shallow features of Mel-frequency cepstral coefficients features.It combines Res2net structure, dilated convolution, and channel attention to capture more fine-grained multi-scale information from the shallow multi-dimensional features.Then it employs a global feature fusion module combined with multiplicative gating mechanisms to enhance cross-scale feature fusion.Finally, differential evolution algorithm optimised support vector machines are used to score the multi-dimensional features.Experimental results indicate that the average evaluation accuracy of the proposed model reaches 94.57%, outperforming comparative models and achieving an objective and accurate assessment of English vocabulary pronunciation.

Me gusta

Guardar

Ver artículo completo