Spam consists of unsolicited messages, and the posting of such irrelevant messages often presents significant challenges in technical forums. Two particular challenges are the dynamic nature of spamming tactics and the inadequacy of adaptable spam databases for automated classifiers. Our work addresses the need for a robust spam classification solution that can be seamlessly integrated with database, SQL, and APEX applications. We developed a labeled spam database by asking experts to categorize 1916 posts as spam or regular posts to ensure accurate classification and then created an SVM-based spam classification model that achieves an average validation accuracy of 90%. Our research enhances the current understanding of spam in technical forums and represents a solution for embedding spam classifiers into widely used platforms with an accuracy of 98.1%. Furthermore, we explore the incorporation of generative topics into our approach by integrating generative topic modeling techniques, such as latent Dirichlet allocation. In our work, the spam classifier is dynamically updated to account for emerging spam patterns and topics based on a generative approach that improves the robustness of the classifier against new spamming tactics and enables nuanced, context-aware filtering of messages. In addition, our experiments highlight the potential of text SVM classifiers for real-time applications through the fine-tuning of text features.
Building similarity graph...
Analyzing shared references across papers
Loading...
Jose Antonio Rivera-Hernandez
Liliana Ibeth Barbosa-Santillan
Juan Jaime Sánchez-Escobar
Data
Tecnológico de Monterrey
Building similarity graph...
Analyzing shared references across papers
Loading...
Rivera-Hernandez et al. (Wed,) studied this question.
www.synapsesocial.com/papers/69d896566c1944d70ce07ac2 — DOI: https://doi.org/10.3390/data11040078