LLM-based agents suffer from behavioral amnesia: corrections, preferences, and operational patterns are lost across session boundaries, while instructions degrade within sessions as context windows fill. Existing approaches focus on memory augmentation — compressing, indexing, and retrieving factual content — but cannot transfer the operational patterns that constitute an agent's behavioral identity: how it approaches problems, what failure patterns it avoids, how it calibrates communication, and which heuristics proved reliable. We present a three-generation framework for guided behavioral evolution that addresses this gap. The first generation, Agent Lineage Evolution (ALE, 2025), introduced generational succession through manual meta-prompts. The second, SOUL (2026), formalized continuous governance with rolling compaction, an external conscience, and hierarchical knowledge inheritance. The third, Succession (2026), introduced mechanical behavioral enforcement immune to instruction drift, organic correction extraction, and CSS-like rule cascading. Across generations, the core insight persists: agent knowledge should be distilled into behavioral identity, not merely compressed into retrievable facts. On SOUL-Bench, a purpose-built evaluation, the framework's compaction layer achieves 20/20 knowledge retention versus 6/20 for a no-memory baseline. On LongMemEval (ICLR 2025), compaction achieves an 86x compression ratio while maintaining stable memory size. On SuccessionBench, a multi-model behavioral enforcement evaluation, we demonstrate measurable instruction drift in Sonnet 4.6 at 150k tokens (compliance drops from 100% to 78%), show that user corrections persist better than system instructions but still degrade at depth, and find that advisory re-injection is more valuable than mechanical blocking for preventing drift. The framework is the first to combine human-shepherded oversight with mechanical behavioral enforcement for LLM agent continuity.
Building similarity graph...
Analyzing shared references across papers
Loading...
Daniel Fook Hao Tan
Meng Jin Chen
Building similarity graph...
Analyzing shared references across papers
Loading...
Tan et al. (Mon,) studied this question.
www.synapsesocial.com/papers/69d5f11e74eaea4b11a7a9d7 — DOI: https://doi.org/10.5281/zenodo.19437320