What question did this study set out to answer?

This research aims to improve human action recognition by addressing the sparsity of discriminative information in videos.

May 8, 2026Open Access

A visual tempo-based attention network for human action recognition

Key Points

This research aims to improve human action recognition by addressing the sparsity of discriminative information in videos.
Proposed a visual tempo-based spatial-temporal attention mechanism to enhance feature representation.
Integrated attention module into recurrent networks in a plug-and-play manner.
Experimental validation conducted on datasets UCF101, HMDB51, and Kinetics-400.
Achieved superior performance compared to RCNN-based architectures.
Demonstrated competitive results against recent state-of-the-art methods.
Balanced high accuracy with computational efficiency.

Abstract

Human action recognition is one of the most challenging tasks in machine intelligence societies. It is important to extract discriminative spatial-temporal features to learn action representation. However, the discriminative information of videos is usually sparse and mixed with a large amount of redundant and interference information, which results in poor performance and recognition failure. Spatial temporal Attention modules enable the network to learn discriminative feature representation of different human actions. One critical key issue which is often missed in the design of these modules is visual tempo of actions. Since a video is formed by a set of spatial changes over time, in this paper, a visual tempo based spatial-temporal attention mechanism is proposed which helps to focus the model on the most meaningful changes in space and time. The proposed attention module is able to flexibly integrated into recurrent networks in a plug-and-play manner. Experimental results on UCF101, HMDB51, and Kinetics-400 demonstrate that the proposed model achieves superior performance among RCNN-based architectures and remains highly competitive with recent state-of-the-art methods, effectively balancing high accuracy with computational efficiency.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Maryam Koohzadi

Nasrollah Moghadam Charkari

Foad Ghaderi

Journals

Discover Artificial Intelligence

Actions

Institutions

K.N.Toosi University of Technology

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

A visual tempo-based attention network for human action recognition

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study