What question did this study set out to answer?

This research aims to develop a framework for controller design in multi-level structures using reinforcement learning.

February 6, 2026Open Access

Composing reinforcement learning policies, with formal guarantees

Key Points

This research aims to develop a framework for controller design in multi-level structures using reinforcement learning.
Created a two-level control framework with a high-level graph and low-level policies.
Applied reactive synthesis for high-level tasks using logical formulas.
Utilized a reinforcement learning procedure to train low-level policies without model distillation.
Established formal guarantees on performance and abstraction quality.
Demonstrated scalability and reusability of low-level policies.
Achieved good performance in navigating environments with moving obstacles.
Provided formal guarantees indicating reliability in decision-making.

Abstract

We propose a novel framework to controller design in environments with a two-level structure: a known high-level graph ("map") in which each vertex is populated by a Markov decision process, called a "room". The framework "separates concerns" by using different design techniques for low- and high-level tasks. We apply reactive synthesis for high-level tasks: given a specification as a logical formula over the high-level graph and a collection of low-level policies obtained together with "concise" latent structures, we construct a "planner" that selects which low-level policy to apply in each room. We develop a reinforcement learning procedure to train low-level policies on latent structures, which unlike previous approaches, circumvents a model distillation step. We pair the policy with probably approximately correct guarantees on its performance and on the abstraction quality, and lift these guarantees to the high-level task. These formal guarantees are the main advantage of the framework. Other advantages include scalability (rooms are large and their dynamics are unknown) and reusability of low-level policies. We demonstrate feasibility in challenging case studies where an agent navigates environments with moving obstacles and visual inputs.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Florent Delgrange

Guy Avni

Anna Lukina

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Composing reinforcement learning policies, with formal guarantees

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study