What question did this study set out to answer?

This research aims to develop an adaptive cyber defense framework using reinforcement learning and attack graph modeling.

May 6, 2026Open Access

Co-Adaptive Attacker–Defender Learning over Attack Graphs: A Stochastic Game Approach to Dynamic Network Defense

Key Points

This research aims to develop an adaptive cyber defense framework using reinforcement learning and attack graph modeling.
Formulated attacker-defender interaction as a repeated zero-sum stochastic game.
Utilized tabular Q-learning and Deep Q-Networks under adaptive conditions.
Conducted experiments with varying training scenarios to evaluate performance.
Defender performance increased significantly with longer training budgets.
Q-learning offered a stable baseline with computational efficiency but lower win rates; DQN achieved the highest win rate at a higher computational cost.
A trade-off exists between defensive effectiveness and runtime efficiency, favoring Q-learning under resource constraints.

Abstract

The evolving landscape of cybersecurity threats, characterized by increasingly sophisticated and adaptive attackers, poses major challenges to traditional static network defense mechanisms. To address these limitations, this paper proposes an adaptive cyber defense framework that integrates Reinforcement Learning (RL) with Attack Graph (AG) modeling. The interaction between attacker and defender is formulated as a repeated zero-sum stochastic game over a partially observable Attack Graph-guided environment, allowing both agents to adapt their strategies through repeated interaction. Two value-based learning approaches are investigated, namely tabular Q-learning and Deep Q-Networks (DQN), under a unified attacker–defender setting. Experimental results across multiple training scenarios show that defender performance improves substantially as the training budget increases. Under limited training, Q-learning provides a computationally efficient and stable baseline, while DQN requires more training and careful tuning to achieve strong performance. However, with extended training, the DQN-based defender attains the highest win rate, albeit at a significantly greater computational cost. In addition, multi-run statistical comparisons highlight a clear trade-off between defensive effectiveness and runtime efficiency: Q-learning remains far more lightweight, whereas DQN offers stronger asymptotic performance when sufficient resources are available. These findings demonstrate the promise of learning-based adaptive defense over attack graphs while also emphasizing the importance of training budget, computational constraints, and model selection in practical cyber defense deployment.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Mohammed A. Makarem

Muneef A. Razaz

Zead Saleh

Journals

Future Internet

Actions

Institutions

Queen's University

University of Business and Technology

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Co-Adaptive Attacker–Defender Learning over Attack Graphs: A Stochastic Game Approach to Dynamic Network Defense

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study