This paper presents SpamShield Pro, a productiongrade, modular spam detection system that unifies three classifierparadigms—Multinomial Naive Bayes (MNB) within a fullyencapsulated sklearn Pipeline, Support Vector Machine (SVM)optimised via GridSearchCV across a nine-combination hyperparameter grid, and a five-layer Deep Neural Network (DNN)with Batch Normalisation, progressive Dropout regularisation,and class-weighted training—with Local Interpretable Modelagnostic Explanations (LIME) post-hoc explainability. Evaluatedon the UCI SMS Spam Collection (5,571 messages, 13.4% spam),the system achieves DNN accuracy of 99.2%, F1=98.9%, andAUC=0.999 with only 5 false positives on 1,115 test messages.The GridSearchCV-optimised SVM achieves 98.5% accuracy andF1=98.3% with 22 false positives, providing a practical highperformance alternative. A novel multi-class sklearn Pipelinereplaces hardcoded regex category classification, achieving 94.0%accuracy across five spam sub-categories (Financial, Promotional,Scam, Adult, Phishing). The system features: sklearn Pipeline encapsulation eliminating preprocessing skew; rotating file logging;a three-tier Flask Blueprint architecture (Routes/Services/Utils);real-time Gmail API integration with token-bucket rate limitingand automated spam folder routing; LIME explanations withvocabulary learned from NB log-probability differences; and afully dynamic frontend with all metrics fetched from REST APIs.This work demonstrates that architectural discipline—modularcode, full Pipeline encapsulation, systematic hyperparametersearch, and explainability integration—transforms a prototypeclassifier into a portfolio-grade production system.
Sinarkar et al. (Tue,) studied this question.