March 3, 2026Open Access

A Hybrid CNN-XGBoost Framework for Phishing Email Detection Using Statistical and Semantic Features

Key Points

Our model achieves an F1-score of 0.9587, highlighting its effectiveness in phishing email detection.
The experimental approach utilized a hybrid framework combining convolutional neural networks and XGBoost classifiers.
Statistical features extracted from email entities inform the model's understanding of phishing tactics.
Integration of semantic features, enabled by the Qwen large language model, reveals emotional manipulation in content.

Abstract

Phishing email detection represents a critical research challenge in cybersecurity. To address this, this paper proposes a novel Double-S (statistical-semantic) feature model based on three core entities involved in email communication: the sender, recipient, and email content. We employ strategic game theory to analyze the offensive strategies of phishing attackers and defensive strategies of protectors, extracting statistical features from these entities. We also leverage the Qwen large language model to excavate implicit semantic features (e.g., emotional manipulation and social engineering tactics) from email content. By integrating statistical and semantic features, our model achieves a robust representation of phishing emails. We introduce a hybrid detection model that integrates a convolutional neural network (CNN) module with the XGBoost (Extreme Gradient Boosting) classifier, effectively capturing local correlations in high-dimensional features. Experimental results on real-world phishing email datasets demonstrate the superiority of our approach, achieving an F1-score of 0.9587, precision of 0.9591, and recall of 0.9583, representing improvements of 1.3%–10.6% compared to state-of-the-art methods.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Lin-Hui Liu

Huazhong University of Science and Technology

Dong-Jie Liu

Yin-Yan Zhang

Journals

Computers, materials & continua/Computers, materials & continua (Print)

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

A Hybrid CNN-XGBoost Framework for Phishing Email Detection Using Statistical and Semantic Features

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study