Human Activity Recognition (HAR) has become increasingly significant across domains such as healthcare, fitness tracking, and human-computer interaction. Unlike vision-based methods, sensor-based HAR relies on inertial data captured from wearable devices, offering greater privacy and robustness in diverse environments. In this work, we present a Transformer-based model for sensor-based activity recognition that utilizes self-attention mechanisms to effectively capture both local and global temporal dependencies in multivariate sensor sequences. The proposed model demonstrates strong recognition performance while maintaining a lightweight architecture. Evaluated on the UCI HAR dataset, our approach achieves a Classification accuracy of 93.69%. We further discuss the advantages of attention-based architectures in sensor-based HAR and highlight potential directions for optimizing model efficiency and real-time deployment.
Kirmani et al. (Thu,) studied this question.