What question did this study set out to answer?

This study aims to analyze teaching effect differences through a multimodal transformer approach, emphasizing time series dynamics.

June 1, 2026Open Access

Time Series Analysis of Teaching Effect Differences Based on Multimodal Transformer Model

Key Points

This study aims to analyze teaching effect differences through a multimodal transformer approach, emphasizing time series dynamics.
Constructed a dataset of 500 hours of classroom videos, audio, and text records.
Utilized ResNet-50 for gesture features, OpenSmile for voice emotions, and BERT for dialogue semantics.
Employed cross-modal attention for feature fusion and temporal convolution for analyzing sequential dependencies.
Model achieved an accuracy rate of 89.7% with an F1 value of 0.87.
Cognitive task accuracy dropped to 44.6% by the 15th minute and to 14.1% by the 33rd minute of the course.
Effectively quantifies teaching dynamics to guide real-time adjustments in educational strategies.

Abstract

In view of the problems of insufficient multimodal data fusion and weak analysis of time series dynamic changes in the current teaching effect evaluation in the field of education, this study proposes a time series analysis model based on multimodal Transformer, which aims to reveal the evolution law of teaching effect by integrating visual, auditory and text modal features. Methodologically, a dataset containing 500 hours of classroom teaching videos, audio and text records is first constructed. ResNet-50 is used to extract the teacher’s gesture expression features, OpenSmile obtains the voice emotion parameters, and BERT extracts the semantic vector of teacher-student dialogue; feature alignment and fusion are performed through the cross-modal attention mechanism, and the temporal convolution layer is introduced to capture the sequential dependency of teaching behavior. Finally, the teaching effect score (0-100 points) is output every 10 minutes through the fully connected layer. The experiment shows that the model accuracy rate reaches 89.7% and the F1 value is 0.87. The time series analysis finds that the accuracy of students’ cognitive tasks drops to 44.6% in the 15th minute of the course and to 14.1% in the 33rd minute. The results show that the model can effectively quantify teaching dynamics, provide data basis for real-time adjustment of teaching methods, and promote the transformation of personalized education decision-making from static evaluation to dynamic optimization.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper