This research is concerned with the innovative use of AI in cross-cultural music learning, more particularly in the creation of a multimodal learning model that combines audio identification, image analysis, and text understanding. The research is conducted with undergraduate music students from various ethnic backgrounds, and the teaching materials include a wide range of cultural music pieces. Sequence identification and semantic modeling are applied in the system to realize personalized recommendations and feedback interventions. As for model development, note recognition and cultural semantic decomposition are activities performed with a hybrid RNN-Transformer architecture. To perform a detailed analysis and to continuously adjust the performance, emotional expression, and cultural knowledge during the learning process, a teaching feedback mechanism and a cultural adaptation algorithm are implemented in the system architecture. The experimental data indicate that the learners’ control of their rhythm, expressiveness of their emotions, and mastery of the culture have been facilitated to a considerable extent by the system’s high recognition accuracy and the useful feedback offered. The experiment reveals that the cultural backgrounds of students play an important role in their learning, and hence teachers are given the opportunity to develop different teaching strategies according to the variations in culture. The developed system, as a technically robust model and application paradigm for cross-cultural music education, has shown great adaptability in the spheres of teaching logic, technical execution, and educational outcomes.
Feng Liu (Mon,) studied this question.