March 3, 2026Open Access

A Comparative Study of Emotion Recognition Systems: From Classical Approaches to Multimodal Large Language Models

Key Points

Emotion recognition systems have transitioned from classical approaches to advanced multimodal large language models.
The analysis highlights the strengths of transformer-based models in improving emotion recognition tasks notably under various conditions.
Assessment included a review of dataset characteristics and evaluation protocols, focusing on their implications for real-world application.
Study underscores the need for robust and efficient systems, particularly for human–AI interaction and assistive technologies.

Abstract

Emotion recognition in video (ERV) aims to infer human affect from visual, audio, and contextual signals and is increasingly important for interactive and intelligent systems. Over the past decade, ERV has evolved from handcrafted features and task-specific deep learning models toward transformer-based vision–language models and multimodal large language models (MLLMs). This review surveys this evolution, with an emphasis on engineering considerations relevant to real-world deployment. We analyze multimodal fusion strategies, dataset characteristics, and evaluation protocols, highlighting limitations in robustness, bias, and annotation quality under unconstrained conditions. Emerging MLLM-based approaches are examined in terms of performance, reasoning capability, computational cost, and interaction potential. By comparing task-specific models with foundation model approaches, we clarify their respective strengths for resource-constrained versus context-aware applications. Finally, we outline practical research directions toward building robust, efficient, and deployable ERV systems for applied scenarios such as assistive technologies and human–AI interaction.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

(Marinescu) et al. (Tue,) studied this question.

www.synapsesocial.com/papers/69a75b76c6e9836116a22cbb — DOI: https://doi.org/10.3390/app16031289

Authors

Mirela-Magdalena Grosu (Marinescu)

Octaviana Datcu

Ruxandra Tapu

Journals

SHILAP Revista de lepidopterología

Applied Sciences

Actions

Institutions

Universitatea Națională de Știință și Tehnologie Politehnica București

Institut Polytechnique de Paris

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

A Comparative Study of Emotion Recognition Systems: From Classical Approaches to Multimodal Large Language Models

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion