What question did this study set out to answer?

This research aims to address the limitations in weight-edited language models, particularly regarding their capacity for knowledge retention.

February 28, 2026Open Access

Per-Fact Graduated Consolidation Resolves the Capacity Ceiling in Weight-Edited Language Models

Key Points

This research aims to address the limitations in weight-edited language models, particularly regarding their capacity for knowledge retention.
Developed per-fact graduated consolidation to independently track consolidation stages of each fact.
Implemented a graduated dissolution schedule to progressively reduce the influence of MEMIT.
Conducted a capacity sweep on Llama 3.1 8B model across various conditions to test performance and recall.
Achieved a 100% advancement rate and 1.00 chat recall across all tested conditions.
Resolved previous alignment tax and memory dissolution issues.
Demonstrated that effective lifetime capacity of the model becomes unbounded.

Abstract

Language models that learn from conversation via direct weight editing (MEMIT) face a hard capacity ceiling: the 8B Llama model sustains reliable recall for only ~13 unconstrained edits before cascading interference collapses performance. Prior attempts to offload knowledge into LoRA adapters failed: the alignment tax (37% recall degradation on 8B) blocks the transfer pathway, and per-edit gating produced 0% advancement. We resolve both failures with per-fact graduated consolidation: each fact independently tracks its consolidation stage, a graduated dissolution schedule (1. 0 -> 0. 5 -> 0. 1 -> 0. 0) progressively reduces MEMIT influence, and cumulative fusing -- training each cycle on an already-fused model -- overcomes the alignment tax through incremental prior erosion. In a capacity sweep on Llama 3. 1 8B (4-bit, 2xH100) with 5, 10, 15, 20 facts across 3 sleep cycles, every condition achieves 100% advancement rate and 1. 00 chat recall. MEMIT edits dissolve as designed, making the buffer renewable: effective lifetime capacity becomes unbounded. This is Paper 6 in the Sleeping LLM series, superseding the MEMIT-only architecture of Paper 5.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Vladimir Baranov

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Per-Fact Graduated Consolidation Resolves the Capacity Ceiling in Weight-Edited Language Models

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider