Facial emotion recognition (FER) plays a crucial role in areas like healthcare, security, and human–computer interaction. In this work, we introduce a deep learning model that uses Inception-V3 for feature extraction and a Bi-directional LSTM for sequence learning. The idea is to capture both spatial patterns from faces and the temporal relationships between expressions. The model was tested on the FER-2013 dataset using subject-independent evaluation to check generalization. For comparison, results from common CNN models such as ResNet-50, MobileNet-V2, and a plain Inception-V3 were utilized. The proposed hybrid system reached about 94.5% accuracy, showing that adding BiLSTM improves recognition compared to CNNs alone. Our results point out that combining convolutional networks with sequence models can boost performance in emotion recognition tasks.
Purakkadavath et al. (Mon,) studied this question.