What question did this study set out to answer?

To develop an effective method for authenticating short video copyrights through spatial-temporal feature fusion.

May 21, 2026Open Access

Short video copyright authentication based on spatial–temporal feature fusion

Key Points

To develop an effective method for authenticating short video copyrights through spatial-temporal feature fusion.
Extracted spatial features from key frames using a deep residual learning model.
Obtained dynamic temporal features by analyzing changes between consecutive frames.
Created a unique fingerprint with a similarity matrix and trained the model using adversarial samples.
The proposed model achieved a mAP value of 0.946, indicating high accuracy in detecting plagiarized content.
Performance was superior to existing video detection models, showcasing its effectiveness and feasibility.

Abstract

The convenient characteristics of the rapid dissemination and production of short videos have led to increasingly significant copyright issues. Existing relevant technologies are difficult to accurately deal with the diversity and complexity of short video content, and at the same time, there are limitations in terms of technological sensitivity, real-time requirements, and the ease of being circumvented. We propose a short video copyright authentication method using spatial–temporal feature fusion to improve detection efficiency and accuracy. Spatial features are extracted from the key frames of a short clip video using a deep residual learning model, and dynamic temporal features are obtained by the changes between consecutive frames. A unique fingerprint is created by a similarity matrix. In addition, the model is trained using adversarial samples to ensure accurate identification of plagiarized content under perturbation. Finally, a "teacher" model is trained into a lighter "student" model through the knowledge distillation. The experimental results demonstrate that the proposed model has good generality and performance, and its mAP value reaches 0.946, showing better performance than other video detection models, verifying the effectiveness and feasibility.

Bookmark

View Full Paper

Bookmark

View Full Paper

Short video copyright authentication based on spatial–temporal feature fusion

Key Points

Abstract

Cite This Study