빠른 접근

No journals followed

Browse journals

No saved papers

개인정보 처리방침

피드 탐색 트렌드 질문 연구자

⌘K

Alignment Makes Models More Decisive Without Making Them More Truthful | Synapse

April 12, 2026Open Access

Alignment Makes Models More Decisive Without Making Them More Truthful

Read Full Paperexternally

Key Points

The study investigates how post-training adjustments affect the decisiveness and accuracy of large language models.
Analyzed 3 model architectures
Utilized 4 reinforcement learning methods
Assessed changes in the commitment layer and representation geometry
Decisiveness improved without a corresponding increase in accuracy
The commitment layer remains unchanged under reinforcement learning
Representations compress monotonically at the fixed prediction point

Abstract

Post-training makes LLMs more decisive without making them more accurate. Across 3 architectures and 4 RL methods, we find the commitment layer—where the model locks in its prediction—doesn't move under reinforcement learning. What changes is the geometry: representations compress monotonically at that fixed point. The earlier layers, where the model selects what to say, remain unchanged. The lock gets tighter. The chooser stays the same.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Loading...

Cite this study

Angel Pena (Sun,) studied this question.

www.synapsesocial.com/papers/69db375f4fe01fead37c55d2 — DOI: https://doi.org/10.5281/zenodo.19490948

Authors

A

Angel Pena

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Loading...