This study investigates how auditory, visual, and lyrical features in music videos shape emotional responses, using EEG data from 26 participants. Five widely viewed music videos were selected based on their global popularity and cross-cultural appeal. A CNN–LSTM model classified emotional states with 97.67% accuracy, and complementary regression results showed strong generalization (best model RMSE = 0.15, MAE = 0.10). Feature selection reduced 27 candidates to a sparse set dominated by auditory and visual cues, and SHAP interpretation revealed a clear modality hierarchy: pitch- and dynamics-related auditory features accounted for the largest share of predictive importance, visual color properties (hue, saturation) provided secondary influence, and lyrical sentiment contributed least. These findings support neuroaesthetic accounts in which low-level sensory structure drives rapid affective appraisal, with visual tone refining emotional meaning. Practically, the results suggest that deliberate control of pitch/dynamics can reliably steer emotional engagement in music-video creation for diverse audiences.
Building similarity graph...
Analyzing shared references across papers
Loading...
Wang et al. (Wed,) studied this question.
www.synapsesocial.com/papers/69e1cf375cdc762e9d8582ca — DOI: https://doi.org/10.1177/02762374261426246
Kai Wang
Yuqing Liu
Yao Song
Empirical Studies of the Arts
Hong Kong Polytechnic University
Sichuan University
Hong Kong Baptist University
Building similarity graph...
Analyzing shared references across papers
Loading...