With the rapid growth of live-streaming e-commerce and digital marketing, abnormal marketing behaviors have become increasingly concealed, coordinated, and intertwined across heterogeneous data modalities, posing substantial challenges to data-driven platform governance and early risk identification. Existing approaches often fail to jointly model cross-modal temporal semantics, the gradual evolution of weak abnormal signals, and organized group-level manipulation. To address these challenges, a data-driven multimodal abnormal behavior detection framework, termed MM-FGDNet, is proposed for large-scale live-streaming environments. The framework models abnormal behaviors from two complementary perspectives, namely temporal evolution and cooperative group structure. A cross-modal temporal alignment module first maps video, text, audio, and user behavioral signals into a unified temporal semantic space, alleviating temporal misalignment and semantic inconsistency across modalities. Building upon this representation, a temporal fraud pattern modeling module captures the progressive transition of abnormal behaviors from early incipient stages to abrupt outbreaks, while a cooperative manipulation detection module explicitly identifies coordinated interactions formed by organized user groups and automated accounts. Extensive experiments on real-world multi-platform live-streaming e-commerce datasets demonstrate that MM-FGDNet consistently outperforms representative baseline methods, achieving an AUC of 0.927 and an F1 score of 0.847, with precision and recall reaching 0.861 and 0.834, respectively, while substantially reducing false alarm rates. Moreover, the proposed framework attains an Early Detection Score of 0.689. This metric serves as a critical benchmark for operational viability, quantifying the system’s capacity to shift platform governance from passive remediation to proactive prevention. It confirms the reliable identification of the “weak-signal” stage—rigorously defined as the incipient phase where subtle, synchronized deviations in interaction rhythms manifest prior to traffic inflation outbreaks—thereby providing the necessary time window for preemptive intervention against coordinated manipulation. Ablation studies further validate the independent contributions of each core module, and cross-domain generalization experiments confirm stable performance across new streamers, new product categories, and new platforms. Overall, MM-FGDNet provides an effective and scalable data-driven artificial intelligence solution for early detection of coordinated abnormal behaviors in live-streaming systems.
Luo et al. (Wed,) studied this question.