In recent years, the task of recognizing facial expressions (FER) has become an important research topic in the development of intelligent educational systems. However, facial expressions are highly dynamic and exhibit substantial variability, which makes accurate recognition difficult. To address this challenge, we introduce a novel multi-modal Collaborative Learning (MaCLe) approach based on the CLIP model for FER applications in online learning scenarios. The implementation of collaborative prompt learning enhances generalization and dynamics. The effectiveness of the proposed model is validated on three widely used benchmarks, including RAF-DB, FER2013 and AffectNet. MaCLe achieves recognition accuracies of 95.45, 80.37 and 77.66% on these datasets, respectively. Ablation studies have demonstrated that an optimal depth of J = 7 and prompt length of 2–4 tokens enable this model to effectively balance accuracy and generalization. Furthermore, even with the integration of multi-modal prompt generation, MaCLe increases FLOPs by only approximately 0.1% in comparison to CoOp and Co-CoOp. Meanwhile, it demonstrates faster convergence, achieving results in 5 epochs compared to 10, and exhibits a lower parameter overhead with only a 2.85% increase compared to CLIP. The results indicate that MaCLe significantly enhances recognition accuracy and improves training efficiency and scalability within e-learning platforms.
Building similarity graph...
Analyzing shared references across papers
Loading...
Jiezhang Min
Haihong Li
Renyuan Cui
Scientific Reports
University of Malaya
City University of Macau
Hengshui University
Building similarity graph...
Analyzing shared references across papers
Loading...
Min et al. (Tue,) studied this question.
www.synapsesocial.com/papers/69d894526c1944d70ce05499 — DOI: https://doi.org/10.1038/s41598-026-47189-z