Deepfake videos become considerably harder to detect under heavy compression because critical high-frequency cues are removed. Conventional super-resolution methods can partially recover these lost details, but they often introduce additional artifacts that closely resemble forgery traces, which may interfere with reliable detection. To address this problem, we propose a novel approach that constructs spatial-temporal dual dictionaries for super-resolution reconstruction, which serves as the backbone for detecting forged facial videos under compression. More specifically, expression key sequences are adopted as inputs to focus on expression-intensive segments, thereby guiding more reliable dictionary construction. A joint DCT-DWT attention mechanism further emphasizes both local texture and global structural cues, while cross-scale and cross-domain feature fusion integrates complementary spatial and temporal information. Finally, anomaly features are constructed by combining reconstruction errors with deviations in sparse coding distributions, enhancing the discriminative power for forgery detection. Experiments on the FaceForensics++ and Celeb-DF datasets demonstrate that our dual dictionary-based restoration method achieves considerably more promising results for the detection of low-quality forged facial videos than state-of-the-art methods. Moreover, it operates directly on compressed videos without requiring paired compressed-uncompressed data. These results demonstrate the practicality of the proposed framework for real-world deployment.
Tu et al. (Tue,) studied this question.