November 8, 2025Open Access

Think When You Need: Self-Adaptive Chain-of-Thought Learning

Key Points

Adaptive learning enhances efficiency in language models with improved conciseness.
Evaluation across reasoning benchmarks demonstrates effectiveness in reducing response length.
The approach utilizes rewards based on performance rather than penalizing longer reasoning.
Supports more accurate problem-solving in scenarios lacking clear ground truth, highlighting its significance.

Abstract

Chain of Thought (CoT) reasoning enhances language models' performance but often leads to inefficient "overthinking" on simple problems. We identify that existing approaches directly penalizing reasoning length fail to account for varying problem complexity. Our approach constructs rewards through length and quality comparisons, guided by theoretical assumptions that jointly enhance solution correctness with conciseness. Moreover, we further demonstrate our method to fuzzy tasks where ground truth is unavailable. Experiments across multiple reasoning benchmarks demonstrate that our method maintains accuracy while generating significantly more concise explanations, effectively teaching models to "think when needed."

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Junjie Yang

Ke Lin

Yu Xing

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Think When You Need: Self-Adaptive Chain-of-Thought Learning

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study