While Deep Neural Networks (DNNs) have driven major breakthroughs in artificial intelligence, their internal complexity often makes their behavior hard to explain, resulting in the well-known “black box” dilemma. This thesis addresses the challenge of interpretability in DNNs and deep reinforcement learning (DRL) through two main contributions. In Part I, we revisit and extend the use of Deep RAM Networks (DRNs) within the Arcade Learning Environment (ALE), showing that, with modern architectures and careful hyperparameter tuning, RAM-based agents can achieve performance competitive with established pixel-based baselines on Atari 2600 games, while offering additional advantages for research and analysis. We also train and evaluate a hybrid agent, that integrates both RAM and pixel observations, demonstrating that in most games it outperforms agents relying on either modality alone. By leveraging the compact and Markovian nature of RAM observations, DRNs not only act as competitive agents, but also enable new forms of analysis, making them particularly well-suited for interpretability studies. In Part II, we dig deeper into agent behavior and DNN internals by introducing two general analysis and interpretability techniques. Trajectory Tracking provides a model-agnostic framework for examining long-term behavioral patterns, applicable to any reinforcement learning agent by querying basic trajectory attributes or attributes augmented by the user, within and across episodes. Neural Pathway Decomposition (NP-Decomp) offers a systematic approach for decomposing compact fully connected DNNs into their constituent neural pathways, tracing from input features and biases to final outputs. This method yields exact, context-specific attributions of how the collective of neural pathways influence an agent's decisions. While computationally intensive, this method reveals structural insights not likely to be obtained through other analysis methods. Applying these methods to DRN agents trained on games such as Breakout, we uncover insights invisible to score-based evaluation alone, such as the agent's lapses in learned behaviors, the behavioral consequences of epsilon-greedy exploration, and the influence of particular RAM addresses on action selection. Trajectory Tracking and NP-Decomp together enable a step from assessing merely what an agent achieves score-wise, towards understanding how and why it behaves as it does. By combining compact RAM-based architectures with deep analytical tools, this thesis lays the foundation for more transparent and interpretable reinforcement learning systems. Trajectory Tracking enables us to survey and explore the landscape of agent behavior over time, while NP-Decomp allows for the extraction of analytical core samples, cross-sections through the network's internal structure that reveal not just what decisions an agent makes, but how those decisions emerge from the collective influence of many neural pathways.
Building similarity graph...
Analyzing shared references across papers
Loading...
Andrew J. Wagner
Building similarity graph...
Analyzing shared references across papers
Loading...
Andrew J. Wagner (Fri,) studied this question.