What question did this study set out to answer?

The aim is to create an advanced spam detection system that integrates multiple classification methods and enhances explainability.

April 16, 2026Open Access

SpamShield Pro: An Industry-Grade Modular Spam Detection System with sklearn Pipelines, GridSearchCV Optimisation, Deep Neural Networks, LIME Explainability, and Gmail API Automation

Key Points

The aim is to create an advanced spam detection system that integrates multiple classification methods and enhances explainability.
Utilized Multinomial Naive Bayes, Support Vector Machine, and Deep Neural Network classifiers within a modular sklearn Pipeline.
Employed GridSearchCV for hyperparameter optimization of the SVM model.
Implemented LIME for post-hoc explainability of model predictions.
Evaluated performance on the UCI SMS Spam Collection dataset.
Achieved 99.2% accuracy with the DNN and 98.5% with the SVM on the test dataset.
DNN attained an F1 score of 98.9% with only 5 false positives.
The SVM model showed an F1 score of 98.3% with 22 false positives.
Achieved 94.0% accuracy in classifying five spam sub-categories.

Abstract

This paper presents SpamShield Pro, a productiongrade, modular spam detection system that unifies three classifierparadigms—Multinomial Naive Bayes (MNB) within a fullyencapsulated sklearn Pipeline, Support Vector Machine (SVM)optimised via GridSearchCV across a nine-combination hyperparameter grid, and a five-layer Deep Neural Network (DNN)with Batch Normalisation, progressive Dropout regularisation,and class-weighted training—with Local Interpretable Modelagnostic Explanations (LIME) post-hoc explainability. Evaluatedon the UCI SMS Spam Collection (5,571 messages, 13.4% spam),the system achieves DNN accuracy of 99.2%, F1=98.9%, andAUC=0.999 with only 5 false positives on 1,115 test messages.The GridSearchCV-optimised SVM achieves 98.5% accuracy andF1=98.3% with 22 false positives, providing a practical highperformance alternative. A novel multi-class sklearn Pipelinereplaces hardcoded regex category classification, achieving 94.0%accuracy across five spam sub-categories (Financial, Promotional,Scam, Adult, Phishing). The system features: sklearn Pipeline encapsulation eliminating preprocessing skew; rotating file logging;a three-tier Flask Blueprint architecture (Routes/Services/Utils);real-time Gmail API integration with token-bucket rate limitingand automated spam folder routing; LIME explanations withvocabulary learned from NB log-probability differences; and afully dynamic frontend with all metrics fetched from REST APIs.This work demonstrates that architectural discipline—modularcode, full Pipeline encapsulation, systematic hyperparametersearch, and explainability integration—transforms a prototypeclassifier into a portfolio-grade production system.

SpamShield Pro: An Industry-Grade Modular Spam Detection System with sklearn Pipelines, GridSearchCV Optimisation, Deep Neural Networks, LIME Explainability, and Gmail API Automation

Key Points

Abstract

Cite This Study