The capability of Large Language Models (LLMs) to plan remains a topic of debate. Some critics argue that strategies to boost LLMs' reasoning skills are ineffective in planning tasks, while others report strong outcomes merely from training models on a planning corpus. This paper revisits these claims by developing an end-to-end LLM-based planner and evaluating a range of reasoning-enhancement strategies --- including fine-tuning, Chain-of-Thought (CoT) prompting, and reinforcement learning (RL) --- across multiple dimensions of plan quality: validity, executability, goal satisfiability, and more. Our findings reveal fine-tuning alone is insufficient, especially on out-of-distribution tasks. Strategies like CoT prompting primarily enhance local coherence, yielding higher executability rates --- a necessary prerequisite for validity --- but provide only incremental gains and struggle to ensure global plan validity. Notably, RL guided by a novel Longest Contiguous Common Subsequence reward significantly enhances both executability and validity, particularly on longer-horizon problems. Overall, our research addresses key misconceptions in the LLM-planning literature and underscores reward-driven RL optimization as a promising direction for advancing robust LLM-based planning by jointly improving executability and validity.
Building similarity graph...
Analyzing shared references across papers
Loading...
Huang et al. (Tue,) studied this question.
www.synapsesocial.com/papers/68d4566c31b076d99fa5bae7 — DOI: https://doi.org/10.1609/icaps.v35i1.36119
Sukai Huang
Trevor Cohn
Nir Lipovetzky
Proceedings of the International Conference on Automated Planning and Scheduling
Building similarity graph...
Analyzing shared references across papers
Loading...