Abstract Facial expression recognition (FER) has a variety of applications in advanced intelligent fields such as human–computer interaction, cognitive psychology, and intelligent driving. However, FER in wild scenarios faces multiple challenges, including occlusion, pose variations, and subtle differences, which make current models unable to address these issues effectively. To tackle these challenges, we propose an efficient and robust Enhanced Mamba-Transformer architecture for FER (FER-EMFormer) in complex scenes. The FER-EMFormer primarily consists of two core modules: the Hybrid Enhanced Mamba-Transformer (HEMT) and the Enhanced Cascaded Mamba-Transformer (ECMT). HEMT effectively combines Mamba and Transformer to capture informative global context and spatial dependency features, while integrating detail feature frequency enhancement across multiple views to enable collaborative global–local feature understanding. ECMT uses a cascaded architecture to fuse the optimized global dependencies obtained from Mamba-Transformer, then employs a Transformer with a multi-dimensional aggregation feedforward network to precisely control the network's information flow, yielding high-density discriminative information and further improving the accuracy of facial expression recognition. Extensive experiments show that our FER-EMFormer significantly outperforms current FER models and achieves state-of-the-art performance of 96.06% on RAF-DB, 95.78% on FERPlus, 72.91% on AffectNet-7, and 70.43% on AffectNet-8, while simultaneously demonstrating excellent robustness and generalization capabilities on occlusion and pose variation expression datasets as well as cross-dataset. The code is available at https://github.com/ferlab08/FER-EMFormer
Building similarity graph...
Analyzing shared references across papers
Loading...
Weijun Gong
Hexi University
Xusheng Du
Japan Advanced Institute of Science and Technology
Jiaxin Wu
Wuhan Business University
Journal of King Saud University - Computer and Information Sciences
Building similarity graph...
Analyzing shared references across papers
Loading...
Gong et al. (Thu,) studied this question.
synapsesocial.com/papers/69fd7f0dbfa21ec5bbf07733 — DOI: https://doi.org/10.1007/s44443-026-00740-4