What type of study is this?

This is a Experimental Study study.

October 20, 2025Open Access

SPEC-RL: Accelerating On-Policy Reinforcement Learning via Speculative Rollouts

Key Points

SPEC-RL reduces rollout time by up to 3x while maintaining policy quality, enhancing training efficiency.
Utilizing overlapping trajectory segments, SPEC-RL avoids computational redundancy, streamlining the rollout process.
SPEC-RL integrates with mainstream algorithms like PPO and GRPO, making it a versatile solution for large language models.
The framework demonstrates effectiveness across diverse benchmarks, including GSM8K and MMLU-STEM, indicating broad applicability.

Abstract

Large Language Models (LLMs) increasingly rely on reinforcement learning with verifiable rewards (RLVR) to elicit reliable chain-of-thought reasoning. However, the training process remains bottlenecked by the computationally expensive rollout stage. Existing acceleration methods-such as parallelization, objective- and data-driven modifications, and replay buffers-either incur diminishing returns, introduce bias, or overlook redundancy across iterations. We identify that rollouts from consecutive training epochs frequently share a large portion of overlapping segments, wasting computation. To address this, we propose SPEC-RL, a novel framework that integrates SPECulative decoding with the RL rollout process. SPEC-RL reuses prior trajectory segments as speculative prefixes and extends them via a draft-and-verify mechanism, avoiding redundant generation while ensuring policy consistency. Experiments on diverse math reasoning and generalization benchmarks, including GSM8K, MATH-500, OlympiadBench, MMLU-STEM, and others, demonstrate that SPEC-RL reduces rollout time by 2-3x without compromising policy quality. As a purely rollout-stage enhancement, SPEC-RL integrates seamlessly with mainstream algorithms (e.g., PPO, GRPO, DAPO), offering a general and practical path to scale RLVR for large reasoning models. Our code is available at https://github.com/ShopeeLLM/Spec-RL

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Bingshuai Liu

Ante Wang

Ziying Min

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

SPEC-RL: Accelerating On-Policy Reinforcement Learning via Speculative Rollouts

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study