Automated detection of suspicious human activities in complex and crowded environments remains a critical challenge in modern surveillance systems due to high false-positive rates, poor contrast and generalization across diverse scenes. We propose a GMCNN3D Model for the classification of suspicious activity based on a Deep Fused Feature Block (DFFB) framework that integrates handcrafted spatial descriptors (PCA-HOG and Motion-HOG) with deep spatiotemporal features extracted from 3D Convolution Neural Network (3D-CNN). Motion regions are first localized using a Gaussian Mixture Model (GMM), after which handcrafted and deep features are concatenated in a dimensionality-normalized fusion stage, followed by a fully connected layer and softmax classification. The system is evaluated on five diverse and publicly available datasets: Violent Crowd, Hockey Fight, Kaggle Fight, Movies Fight, and Custom Annotated YouTube Clips, achieving up to 99. 12% accuracy, 98. 7% F1-score, and a ROC-AUC of 0. 992, outperforming state-of-the-art CNN, LSTM, and SlowFast models. All datasets include real world scenarios with varying lighting, crowd density, and camera viewpoints, with annotations created manually where unavailable. The proposed method demonstrates robust cross-scene performance, enabling automated alarming and reduced false positives in real-time security operations.
Building similarity graph...
Analyzing shared references across papers
Loading...
Mughal et al. (Mon,) studied this question.
www.synapsesocial.com/papers/69df2b49e4eeef8a2a6b034b — DOI: https://doi.org/10.3390/digital6020030
Bushra Mughal
Fernando B. Duarte
Tiago Cunha Reis
Digital
Instituto Politécnico de Lisboa
Universidade Lusófona
Building similarity graph...
Analyzing shared references across papers
Loading...