What question did this study set out to answer?

To develop an interpretable framework that improves decision-making in reinforcement learning by using recurring behavior patterns.

January 18, 2026Open Access

Scenario-Guided Temporal Prototypes in Reinforcement Learning

Key Points

To develop an interpretable framework that improves decision-making in reinforcement learning by using recurring behavior patterns.
Introduced a framework for case-based decision making in reinforcement learning.
Grouped decision-making trajectories into recurring behavior patterns as prototypes.
Developed a local policy that links short-term patterns to actions using a similarity score.
The method provides pre hoc explanations for actions taken while maintaining high performance.
Clear explanations were generated for actions based on pattern recognition in simulations of CarRacing and voltage control.

Abstract

Deep reinforcement learning policies are hard to deploy in safety-critical settings, because they fail to explain why a sequence of actions is taken. We introduce an intrinsically interpretable framework that learns compact summaries of recurring behavior and uses them for case-based decision making. Our method (i) discovers global regimes by grouping trajectories into a small set of recurrent patterns and (ii) learns a prototype-conditioned local policy that maps the current short-horizon pattern to an action (“this matches prototype X → take action Y”). Each action is accompanied by a similarity score to relevant prototypes, which provide the explanations. We evaluate our approach on two domains: (1) CarRacing (pixel-based continuous control) and (2) a real voltage-control problem in low-voltage distribution networks. Our results indicate that the method provides clear pre hoc explanations while keeping task performance close to the reference policy.

Scenario-Guided Temporal Prototypes in Reinforcement Learning

Key Points

Abstract

Cite This Study