March 3, 2026

Divide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement Learning

Key Points

GLIDER improves long-horizon decision-making abilities in large language models (LLMs) by enhancing exploration and learning.
It introduces a hierarchical reinforcement learning framework to decompose complex tasks into manageable sub-tasks, promoting coherent reasoning.
The innovative design allows the low-level controller to learn from abstract plans provided by the high-level policy, facilitating effective learning.
GLIDER shows promising performance on benchmarks like ScienceWorld and ALFWorld, indicating strong adaptability and generalization in dynamic environments.

Abstract

While showing sophisticated reasoning abilities, large language models (LLMs) still struggle with long-horizon decision-making tasks due to deficient exploration and long-term credit assignment, especially in sparse-reward scenarios. Inspired by the divide-and-conquer principle, we propose an innovative framework GLIDER (Grounding Language Models as EffIcient Decision-Making Agents via Offline HiErarchical Reinforcement Learning) that introduces a parameter-efficient and generally applicable hierarchy to LLM policies. We develop a scheme where the low-level controller is supervised with abstract, step-by-step plans that are learned and instructed by the high-level policy. This design decomposes complicated problems into a series of coherent chain-of-thought reasoning sub-tasks, providing flexible temporal abstraction to significantly enhance exploration and learning for long-horizon tasks. Furthermore, GLIDER facilitates fast online adaptation to non-stationary environments owing to the strong transferability of its task-agnostic low-level skills. Experiments on ScienceWorld and ALFWorld benchmarks show that GLIDER achieves consistent performance gains, along with enhanced generalization capabilities.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Wei Liu

Chunlin Chen

Zhi Wei Wang

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Divide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement Learning

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study