March 3, 2026Open Access

Förklarlig Förstärkningsinlärning genom Distributionell Temporal Policydekomposition : Integrering av osäkerhetsmedvetenhet i prediktioner av framtida utfall

Key Points

Distributional temporal decomposition enhances predictive uncertainty estimations, leading to more detailed explanations.
The method was evaluated in environments like the CartPole benchmark and antenna tilt optimization task.
Performance was comparable to standard reinforcement learning methods while offering richer insights into agent behavior.
This approach highlights the need for interpretable solutions in critical infrastructure applications.

Abstract

When it comes to deploying reinforcement learning (RL) solutions to critical infrastructure like telecommunication, the lack of interpretability and transparency in modern methods presents a major obstacle. This thesis presents an approach to explainable RL (XRL) that combines distributional RL with prior work on temporal policy decomposition. Extending temporal decomposition to the distributional setting enables the prediction of full outcome distributions at each future time step rather than only expected values. This enables more finegrained explanations and provides natural estimates of predictive uncertainty. The method is evaluated both for policy prediction and as a control algorithm in two environments: the classic CartPole benchmark and a realistic antenna tilt optimization task in a simulated radio access network (RAN) environment from Ericsson. The results show that distributional temporal decomposition can be used posthoc for policy evaluation or online as a control algorithm, achieving performance comparable to standard RL methods while providing richer information about agent behavior.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Knut Salomonsson

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Förklarlig Förstärkningsinlärning genom Distributionell Temporal Policydekomposition : Integrering av osäkerhetsmedvetenhet i prediktioner av framtida utfall

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study