What question did this study set out to answer?

To develop a machine learning method for detecting malware in PDF files while ensuring model explainability.

April 15, 2026Open Access

PDF Malware Detection Toward Machine Learning With Explainability Analysis

Key Points

To develop a machine learning method for detecting malware in PDF files while ensuring model explainability.
Preprocess PDF files using tokenization, vectorization, and normalization techniques.
Employ supervised models like Random Forests and Support Vector Machines (SVM).
Utilize unsupervised anomaly detection for classification.
Implement explainability techniques such as SHAP and LIME to interpret model predictions.
Achieved improved accuracy in detecting PDF malware.
Provided detailed insights into feature importance for classification decisions.
Enhanced the ability of cybersecurity analysts to understand model behavior.

Abstract

The growing use of PDF files in various domains has made them a prime target for malware attacks, necessitating robust and efficient detection systems. This study presents a machine learning-based approach to PDF malware detection, integrating explainability analysis to provide a transparent and interpretable model. Leveraging content-based, structural, and metadata features, we preprocess PDF files through tokenization, vectorization, and normalization techniques to optimize data for classification algorithms. We employ supervised models like Random Forests and Support Vector Machines (SVM), along with unsupervised anomaly detection for robust classification. To enhance model transparency, we use explainability techniques such as SHAP (Shapley Additive explanations) and LIME (Local Interpretable Model-Agnostic Explanations), which allow for detailed insights into feature importance and the reasoning behind individual predictions. This approach not only enhances malware detection accuracy but also empowers cybersecurity analysts with actionable insights, marking a step forward in the development of explainable and resilient malware detection systems

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Keerthana Kethavath

Dama Avinash

Mechineni Arjun

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

PDF Malware Detection Toward Machine Learning With Explainability Analysis

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study