Language models that learn from conversation via direct weight editing (MEMIT) face a hard capacity ceiling: the 8B Llama model sustains reliable recall for only ~13 unconstrained edits before cascading interference collapses performance. Prior attempts to offload knowledge into LoRA adapters failed: the alignment tax (37% recall degradation on 8B) blocks the transfer pathway, and per-edit gating produced 0% advancement. We resolve both failures with per-fact graduated consolidation: each fact independently tracks its consolidation stage, a graduated dissolution schedule (1. 0 -> 0. 5 -> 0. 1 -> 0. 0) progressively reduces MEMIT influence, and cumulative fusing -- training each cycle on an already-fused model -- overcomes the alignment tax through incremental prior erosion. In a capacity sweep on Llama 3. 1 8B (4-bit, 2xH100) with 5, 10, 15, 20 facts across 3 sleep cycles, every condition achieves 100% advancement rate and 1. 00 chat recall. MEMIT edits dissolve as designed, making the buffer renewable: effective lifetime capacity becomes unbounded. This is Paper 6 in the Sleeping LLM series, superseding the MEMIT-only architecture of Paper 5.
Building similarity graph...
Analyzing shared references across papers
Loading...
Vladimir Baranov
Building similarity graph...
Analyzing shared references across papers
Loading...
Vladimir Baranov (Wed,) studied this question.
www.synapsesocial.com/papers/69a286b80a974eb0d3c01dcb — DOI: https://doi.org/10.5281/zenodo.18779159
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: