The rapid spread of deepfake content on social networks, video conferencing platforms, and voice communication systems means we need to find ways to detect it that are fast and work well. This paper presents a lightweight multimodal deepfake detection framework designed for real-time deployment under resource-constrained environments. The system integrates a hybrid architecture combining ResNet18-based convolutional neural networks for spatial feature extraction, EfficientNet for frame-level video analysis, and Wav2Vec2 for audio representation learning. We use these tools to get information from each type of media and then combine them to make a decision. We tested our system using some datasets like FaceForensics++ Celeb-DF and ASVspoof 2019. Experimental results demonstrate an accuracy of 88.25% for image-based detection, 70.56% for video frame analysis, and 81.50% for audio classification under CPU-only deployment. The system achieves real-time performance with low latency and reduced computational overhead, making it suitable for practical applications. Our approach provides an effective trade-off between detection accuracy and computational efficiency, enabling deployment in real-world scenarios such as social media content moderation, secure video conferencing, and voice phishing prevention.
Building similarity graph...
Analyzing shared references across papers
Loading...
Sanjana Shetty
Ketaki Sakhadeo
Tejashree Deore
Cureus Journal of Computer Science.
MIT Art, Design and Technology University
Building similarity graph...
Analyzing shared references across papers
Loading...
Shetty et al. (Tue,) studied this question.
www.synapsesocial.com/papers/69d893eb6c1944d70ce04df1 — DOI: https://doi.org/10.7759/s44389-026-00052-8