When it comes to deploying reinforcement learning (RL) solutions to critical infrastructure like telecommunication, the lack of interpretability and transparency in modern methods presents a major obstacle. This thesis presents an approach to explainable RL (XRL) that combines distributional RL with prior work on temporal policy decomposition. Extending temporal decomposition to the distributional setting enables the prediction of full outcome distributions at each future time step rather than only expected values. This enables more finegrained explanations and provides natural estimates of predictive uncertainty. The method is evaluated both for policy prediction and as a control algorithm in two environments: the classic CartPole benchmark and a realistic antenna tilt optimization task in a simulated radio access network (RAN) environment from Ericsson. The results show that distributional temporal decomposition can be used posthoc for policy evaluation or online as a control algorithm, achieving performance comparable to standard RL methods while providing richer information about agent behavior.
Building similarity graph...
Analyzing shared references across papers
Loading...
Knut Salomonsson
Building similarity graph...
Analyzing shared references across papers
Loading...
Knut Salomonsson (Wed,) studied this question.