Post-training makes LLMs more decisive without making them more accurate. Across 3 architectures and 4 RL methods, we find the commitment layer—where the model locks in its prediction—doesn't move under reinforcement learning. What changes is the geometry: representations compress monotonically at that fixed point. The earlier layers, where the model selects what to say, remain unchanged. The lock gets tighter. The chooser stays the same.
Building similarity graph...
Analyzing shared references across papers
Loading...
Angel Pena (Sun,) studied this question.
www.synapsesocial.com/papers/69db375f4fe01fead37c55d2 — DOI: https://doi.org/10.5281/zenodo.19490948
Angel Pena
Building similarity graph...
Analyzing shared references across papers
Loading...