The growing use of PDF files in various domains has made them a prime target for malware attacks, necessitating robust and efficient detection systems. This study presents a machine learning-based approach to PDF malware detection, integrating explainability analysis to provide a transparent and interpretable model. Leveraging content-based, structural, and metadata features, we preprocess PDF files through tokenization, vectorization, and normalization techniques to optimize data for classification algorithms. We employ supervised models like Random Forests and Support Vector Machines (SVM), along with unsupervised anomaly detection for robust classification. To enhance model transparency, we use explainability techniques such as SHAP (Shapley Additive explanations) and LIME (Local Interpretable Model-Agnostic Explanations), which allow for detailed insights into feature importance and the reasoning behind individual predictions. This approach not only enhances malware detection accuracy but also empowers cybersecurity analysts with actionable insights, marking a step forward in the development of explainable and resilient malware detection systems
Building similarity graph...
Analyzing shared references across papers
Loading...
Keerthana Kethavath
Dama Avinash
Mechineni Arjun
Building similarity graph...
Analyzing shared references across papers
Loading...
Kethavath et al. (Thu,) studied this question.
www.synapsesocial.com/papers/69df2c9ee4eeef8a2a6b1c65 — DOI: https://doi.org/10.56975/ijsdr.v11i4.307433