What question did this study set out to answer?

The research aims to improve facial expression recognition (FER) in e-learning platforms using a novel collaborative learning approach.

April 10, 2026Open Access

Multi-modal collaborative learning for facial expression recognition in e-learning platforms

Key Points

The research aims to improve facial expression recognition (FER) in e-learning platforms using a novel collaborative learning approach.
Introduced a multi-modal Collaborative Learning (MaCLe) approach based on the CLIP model.
Implemented collaborative prompt learning to enhance generalization and dynamics.
Validated effectiveness on benchmarks such as RAF-DB, FER2013, and AffectNet.
Achieved recognition accuracies of 95.45%, 80.37%, and 77.66% on the respective datasets.
Demonstrated fast convergence, requiring only 5 epochs for results compared to 10.
Maintained a low increase in parameter overhead with only a 2.85% rise compared to CLIP.

Abstract

In recent years, the task of recognizing facial expressions (FER) has become an important research topic in the development of intelligent educational systems. However, facial expressions are highly dynamic and exhibit substantial variability, which makes accurate recognition difficult. To address this challenge, we introduce a novel multi-modal Collaborative Learning (MaCLe) approach based on the CLIP model for FER applications in online learning scenarios. The implementation of collaborative prompt learning enhances generalization and dynamics. The effectiveness of the proposed model is validated on three widely used benchmarks, including RAF-DB, FER2013 and AffectNet. MaCLe achieves recognition accuracies of 95.45, 80.37 and 77.66% on these datasets, respectively. Ablation studies have demonstrated that an optimal depth of J = 7 and prompt length of 2–4 tokens enable this model to effectively balance accuracy and generalization. Furthermore, even with the integration of multi-modal prompt generation, MaCLe increases FLOPs by only approximately 0.1% in comparison to CoOp and Co-CoOp. Meanwhile, it demonstrates faster convergence, achieving results in 5 epochs compared to 10, and exhibits a lower parameter overhead with only a 2.85% increase compared to CLIP. The results indicate that MaCLe significantly enhances recognition accuracy and improves training efficiency and scalability within e-learning platforms.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Jiezhang Min

Haihong Li

Renyuan Cui

Journals

Scientific Reports

Actions

Institutions

University of Malaya

City University of Macau

Hengshui University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Multi-modal collaborative learning for facial expression recognition in e-learning platforms

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study