Deep reinforcement learning policies are hard to deploy in safety-critical settings, because they fail to explain why a sequence of actions is taken. We introduce an intrinsically interpretable framework that learns compact summaries of recurring behavior and uses them for case-based decision making. Our method (i) discovers global regimes by grouping trajectories into a small set of recurrent patterns and (ii) learns a prototype-conditioned local policy that maps the current short-horizon pattern to an action (“this matches prototype X → take action Y”). Each action is accompanied by a similarity score to relevant prototypes, which provide the explanations. We evaluate our approach on two domains: (1) CarRacing (pixel-based continuous control) and (2) a real voltage-control problem in low-voltage distribution networks. Our results indicate that the method provides clear pre hoc explanations while keeping task performance close to the reference policy.
Dobravec et al. (Fri,) studied this question.