When people speak with emotions, facial motions are coupled with both context-driven lip motions and emotion-driven facial expressions. However, existing speech-driven emotional 3D facial generation methods aim to generate directly emotional talking faces from speech, without decoupling these motions. It increases the ambiguity during training regarding whether the current facial motion is related to the context or emotion, hindering effective learning. To decouple them effectively, we introduce the contrastive emotion-face loss that learns the mapping between emotions of speech and expressions in talking lip motion. In addition, since individuals have unique emotional expression styles, we enable personalized emotional expression by utilizing person-specific embeddings. By optimizing the person-specific embeddings, the proposed method can generate emotional talking faces personalized to the target subject. At last, the proposed method generates head motions that align with the context and emotion of speech while maintaining diversity. Through extensive experiments, it is demonstrated that our method achieves 7.18% performance improvement over state-of-the-art methods by animating emotional 3D talking facial animation.
Building similarity graph...
Analyzing shared references across papers
Loading...
Seongmin Lee (Tue,) studied this question.
www.synapsesocial.com/papers/69d896166c1944d70ce075fd — DOI: https://doi.org/10.33851/jmis.2026.13.1.1
Seongmin Lee
Journal of Multimedia Information System
Building similarity graph...
Analyzing shared references across papers
Loading...