As artificial intelligence has advanced, computer-generated fake content has become increasingly prevalent. Deepfake is an advanced fake creation generated using deep learning-based technologies, and deepfake images, videos, and voices have spread rapidly. The distinction between real and fake content is very hard for the naked eye, making reliable detection essential. Existing deepfake detection methods have achieved success, but still face limitations in keeping pace with the rapid evolution of deepfake generation techniques. In particular, CNN-based approaches may require a large number of parameters and may not fully capture spatial hierarchies. In this research, we investigate whether Capsule Networks can provide an effective and parameter-efficient alternative for deepfake detection. We proposed four different Capsule Network architectures by altering the size, complexity, and configuration. A comparative analysis is conducted against state-of-the-art Capsule models and CNN models across various datasets, utilizing AUC% and the number of parameters as evaluation criteria. For transparency, we note that some baseline CNN and Capsule models follow the training protocols and datasets reported in their original studies, which may differ across implementations. Our experimental results show that the proposed Capsule Network models achieved over 98% AUC% on the evaluated datasets while using fewer parameters than several CNN-based models. These findings suggest that Capsule Networks exhibit greater efficacy in detecting deepfakes compared to traditional CNN-based methods and represent a promising direction for future research.
Weerawardana et al. (Thu,) studied this question.