Detecting deepfake-based fraud in financial Video Know Your Customer (V-KYC) systems is increasingly critical due to sophisticated AI-generated manipulations and stringent privacy regulations. This study proposes FMM-MMF (Federated Micro-Expression Mining and Multi-Modal Metadata Fusion), a privacy-preserving framework that integrates facial micro-expression analysis, audio cues, and behavioral metadata within a federated learning (FL) architecture. The framework enables decentralized model training on edge devices, preserving personally identifiable information (PII) while maintaining real-time inference. Evaluation was conducted on multi-modal V-KYC datasets, including FaceForensics++ (FF++) and synthetic deepfake datasets tailored for KYC scenarios. The proposed model achieves 96.74% accuracy, an F1-score of 0.987, and demonstrates robust performance under non-independent and identically distributed (non-IID) conditions, compression artifacts, and adversarial perturbations. Compared to baseline models such as XceptionNet, EfficientNet-B4, and FedAvg-based Long Short-Term Memory (LSTM) networks, FMM-MMF improves minority class detection by 11.6% precision while maintaining low communication overhead and resource efficiency. The results highlight the framework’s scalability, cross-modal robustness, and practical applicability in real-world banking environments, making it suitable for secure, privacy-conscious, and automated V-KYC verification.
Rawat et al. (Fri,) studied this question.