We report a surprising inverse scaling phenomenon in LoRA-based memory consolidation for language models. At 3B parameters, sleep-wake consolidation achieves 47% factual recall after training. At 8B, recall drops to 37% with significant confabulation. At 70B, recall is zero despite successful training (low loss, correct gradient flow). We identify RLHF alignment as the cause: safety training creates a behavioral prior that overrides LoRA-injected knowledge at inference time. The effect scales with model size because larger models receive more extensive alignment training. This 'alignment tax' on continual learning has implications for any system attempting to inject new knowledge into aligned language models via parameter-efficient fine-tuning.
Building similarity graph...
Analyzing shared references across papers
Loading...
Vladimir Baranov (Sun,) studied this question.
www.synapsesocial.com/papers/69a287350a974eb0d3c02bb2 — DOI: https://doi.org/10.5281/zenodo.18778761
Vladimir Baranov
Building similarity graph...
Analyzing shared references across papers
Loading...