In response to the three major challenges in AI music generationlimited chord representation, monotonous emotions, and low audio fidelitythis research proposes a novel end-to-end framework termed PEMF that integrates PerformanceNet with a multi-emotion music generation model.The core innovations include a structured four-dimensional chord encoding method using root, third, fifth, and crown notes to expand harmonic diversity to 60 chord types, a dual-encoding transformer architecture that independently processes melody and chord streams for superior structural coherence, and a fine-grained emotion regulation mechanism mapping pitch histograms and rhythm density parameters to Russell's two-dimensional emotion space for continuous control.For audio synthesis, an asymmetric U-net structure combined with a multi-band residual learning mechanism and a flooding loss strategy significantly enhances spectral fidelity and training stability.Experimental results demonstrate that PEMF achieves a chord vocabulary coverage near 1.0, an emotion recognition accuracy of 92.3% significantly outperforming symbolic transformer's 78.6%, a high-frequency energy retention rate of 89.1%, and a Fréchet audio distance of 0.5.System performance shows a 36.9%improvement in emotional consistency and a 64.2% reduction in latency compared to staged training, validating its efficacy in practical applications like music therapy and film scoring.
Li Chai (Thu,) studied this question.