What question did this study set out to answer?

This research aims to assess the effectiveness of a Retrieval Augmented Generation approach in improving threat detection for AWS CloudTrail events.

March 2, 2026Open Access

Retrieval-Augmented Large Language Model for AWS Cloud Threat Detection and Modelling: Cloudtrail Mitre ATT&CK Mapping

Key Points

This research aims to assess the effectiveness of a Retrieval Augmented Generation approach in improving threat detection for AWS CloudTrail events.
Evaluated a two-step Retrieval Augmented Generation (RAG) approach with Gemini 2.5 Pro.
Constructed a dataset of 200 unique CloudTrail events (122 malicious, 78 benign).
Used expert annotation for ground truth labels, achieving 90% inter-annotator agreement.
Conducted cost-latency analysis comparing RAG model performance to commercial SIEM solutions.
Achieved 78% accuracy, 85% precision, and 79% F1-score with RAG model.
Realized a 70.5% accuracy improvement and 76.4% F1-score improvement over baseline Gemini 2.5 Pro.
Processing time was 4.1 seconds per event, costing $0.00376, demonstrating cost-effectiveness.

Abstract

Amazon Web Services (AWS) CloudTrail auditing service provides detailed records of operational and security events, enabling cloud administrators to monitor user activity and manage compliance. Although signature-based threat detection methods have been enhanced with machine learning and Large Language Models (LLMs), these approaches remain limited in addressing emerging threats. This study evaluates a two-step Retrieval Augmented Generation (RAG) approach using Gemini 2. 5 Pro to enhance threat detection accuracy and contextual relevance. The RAG system integrates external cybersecurity knowledge sources including the MITRE ATT&CK framework, AWS Threat Technique Catalogue, and threat reports to overcome limitations of static pre-trained LLMs. We constructed an evaluation dataset of 200 unique CloudTrail events (122 malicious, 78 benign) using the Stratus Red Team adversary emulation framework, covering 9 MITRE ATT&CK techniques across 8 tactics. Events were sampled from 1724 total events using stratified sampling. Ground truth labels were created through systematic expert annotation with 90% inter-annotator agreement. The RAG-enabled model achieved estimated 78% accuracy, 85% precision, and 79% F1-score, representing 70. 5% accuracy improvement and 76. 4% F1-score improvement over baseline Gemini 2. 5 Pro (46% accuracy, 45% F1-score). Performance are based on evaluation results on 200-event dataset. Cost-latency analysis revealed processing time of 4. 1 s and cost of 0. 00376 per event, comparable to commercial SIEM solutions while providing superior MITRE ATT&CK attribution. The findings demonstrate that RAG substantially enhances context-aware threat detection, providing actionable insights for cloud security operations.

Bookmark

View Full Paper

Cite This Study

Adediran et al. (Thu,) studied this question.

synapsesocial.com/papers/69a52de5f1e85e5c73bf1174 https://doi.org/https://doi.org/10.32604/cmc.2026.077606

Bookmark

View Full Paper