Abstract Taking photos of sensitive facilities and sensitive information in no photography area may cause sensitive information leakage if not discovered in time. Employing action recognition models to detect instances of photography can effectively prevent information leakage. Current action recognition models have shown unsatisfactory performance in detecting photo-taking actions in surveillance videos, and their reliance on GPU devices hinder their practicality. This paper presents a novel approach to address the detection of photo-taking actions. The method utilizes object detection to filter out background data and incorporates human pose estimation to extract human skeleton data. By combining these AI techniques, the method enables accurate recognition of photo-taking actions. We introduce a novel technique called self-annotation that enables the model to focus on the crucial elements associated with photo-taking actions. Additionally, we introduce a new alarm mechanism that leads to a 69 \% % reduction in false positives while maintaining the same level of recall by integrating the labels over a period to recognize actions. Compared with traditional action recognition approaches, our method is more flexible and lightweight in actual engineering applications. Moreover, our model is capable of running on CPU-only devices. Experimental results show that our model achieves a precision of 91 \% % on our dataset.
Liu et al. (Fri,) studied this question.